Re: [ZODB-Dev] BTree pickle size

2008-08-28 Thread Dieter Maurer
Roché Compaan wrote at 2008-8-24 14:00 +0200:
>This is the fsdump output for a single IOBTree:
>
>  data #00032 oid=1bac size=5435 class=BTrees._IOBTree.IOBTree
>
>What is persisted as part of the 5435 bytes? References to containing
>buckets? What else?

For optimization reasons,
an "IOBTree" can in fact essentially be an "IOBucket" (in case of a small
tree consisting of a single bucket).

This means that the "IOBTree" above can in fact contains
up to 60 integers with corresponding values (Python objects).



-- 
Dieter
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  [email protected]
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] BTree pickle size

2008-08-28 Thread Dieter Maurer
Roché Compaan wrote at 2008-8-25 17:36 +0200:
>On Sun, 2008-08-24 at 08:55 +0200, Roché Compaan wrote:
>> Thanks for the feedback. I'll re-run the tests without any text indexes,
>> as well as run it with other implementations such as TextIndexNG3 and
>> SimpleTextIndex and compare the results.
>> 
>
>Some more tests show that text indexes aren't the worst offenders. Date
>and DateRangeIndexes use IISet in cases where IITreeSet seem more
>appropriate. To me there isn't much more value to investigate other text
>index implementations. I'd rather spend to time to compare the overall
>results with other indexing implementations altogether, like solr or
>indexing in a RDMBS.
>
>Listed below are some stats (where I ran my original test in which I
>create 1 documents) that compare an unmodified setup, a catalog
>without text indexes, a catalog without date indexes, a catalog without
>metadata and no catalog at all.
>
>Total size of default setup:  2569.97 MB
>Total size excluding text indexes:1963.89 MB

This means text indexes cost about 600 MB (25 %).

>Total size excluding date range indexes:  2043.26 MB

This means range indexes cost about 500 MB.

You may consider a "Managable RangeIndex" instead of the standard
range indexes.

With "Managable RangeIndex" a "DateRangeIndex" is implemented
as a "RangeIndex" with data type "DateInteger" or "DateTimeInteger".


If you also use "dm.incrementalsearch" with "Products.AdvancedQuery",
then you can replace the (expensive, both in terms of storage
as well as runtime) range indexes by incremental filtering --
which may not only let you save lots of space but also can
give dramatic speed improvements.



-- 
Dieter
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  [email protected]
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] BTree pickle size

2008-08-25 Thread Roché Compaan
On Sun, 2008-08-24 at 08:55 +0200, Roché Compaan wrote:
> Thanks for the feedback. I'll re-run the tests without any text indexes,
> as well as run it with other implementations such as TextIndexNG3 and
> SimpleTextIndex and compare the results.
> 

Some more tests show that text indexes aren't the worst offenders. Date
and DateRangeIndexes use IISet in cases where IITreeSet seem more
appropriate. To me there isn't much more value to investigate other text
index implementations. I'd rather spend to time to compare the overall
results with other indexing implementations altogether, like solr or
indexing in a RDMBS.

Listed below are some stats (where I ran my original test in which I
create 1 documents) that compare an unmodified setup, a catalog
without text indexes, a catalog without date indexes, a catalog without
metadata and no catalog at all.

Total size of default setup:  2569.97 MB
Total size excluding text indexes:1963.89 MB
Total size excluding date range indexes:  2043.26 MB
Total size excluding catalog:  270.90 MB


Unmodified setup (default portal_catalog):
==
Classname, Count, Size (kB), Size (mB), Avg (kB)

Products.PlonePAS.tools.memberdata.MemberData, 1, 0, 0, 0
OFS.Folder.Folder, 1, 537, 0.52, 0.05
BTrees.Length.Length, 220107, 6382, 6.23, 0.03
Products.Archetypes.BaseUnit.BaseUnit, 3, 7504, 7.33, 0.25
Persistence.mapping.PersistentMapping, 2, 8261, 8.07, 0.41
Products.ATContentTypes.content.document.ATDocument, 1, 15077,
14.72, 1.51
BTrees._OIBTree.OIBTree, 4613, 23008, 22.47, 4.99
BTrees._IIBTree.IITreeSet, 49383, 25975, 25.37, 0.53
BTrees._OOBTree.OOBTree, 15875, 52354, 51.13, 3.3
BTrees._IIBTree.IIBTree, 143942, 53566, 52.31, 0.37
BTrees._OOBTree.OOBucket, 115332, 70789, 69.13, 0.61
BTrees._IOBTree.IOBTree, 25645, 71072, 69.41, 2.77
BTrees._OIBTree.OIBucket, 132417, 101472, 99.09, 0.77
BTrees._IIBTree.IIBucket, 252121, 163524, 159.69, 0.65
BTrees._IOBTree.IOBucket, 655025, 1007623, 984.01, 1.54
BTrees._IIBTree.IISet, 640686, 1024506, 1000.49, 1.6

Total size: 2569.97 MB


No text indexes:

Classname, Count, Size (kB), Size (mB), Avg (kB)

OFS.Folder.Folder, 1, 537, 0.52, 0.05
BTrees.Length.Length, 120077, 3397, 3.32, 0.03
Products.Archetypes.BaseUnit.BaseUnit, 3, 7504, 7.33, 0.25
Persistence.mapping.PersistentMapping, 2, 8242, 8.05, 0.41
Products.ATContentTypes.content.document.ATDocument, 1, 15057, 14.7,
1.51
BTrees._OIBTree.OIBTree, 3467, 15740, 15.37, 4.54
BTrees._IIBTree.IITreeSet, 45734, 24355, 23.78, 0.53
BTrees._IOBTree.IOBTree, 20895, 37559, 36.68, 1.8
BTrees._OOBTree.OOBTree, 15878, 52460, 51.23, 3.3
BTrees._OOBTree.OOBucket, 115336, 70789, 69.13, 0.61
BTrees._OIBTree.OIBucket, 92499, 75572, 73.8, 0.82
BTrees._IOBTree.IOBucket, 28, 670918, 655.19, 2.33
BTrees._IIBTree.IISet, 644224, 1028891, 1004.78, 1.6

Total size: 1963.89 MB


No date and date range indexes:
===
Classname, Count, Size (kB), Size (mB), Avg (kB)

OFS.Folder.Folder, 1, 537, 0.52, 0.05
BTrees.Length.Length, 220002, 6379, 6.23, 0.03
Products.Archetypes.BaseUnit.BaseUnit, 3, 7504, 7.33, 0.25
Persistence.mapping.PersistentMapping, 2, 8242, 8.05, 0.41
Products.ATContentTypes.content.document.ATDocument, 1, 15057, 14.7,
1.51
BTrees._OIBTree.OIBTree, 2688, 16786, 16.39, 6.24
BTrees._IIBTree.IITreeSet, 35868, 20593, 20.11, 0.57
BTrees._OOBTree.OOBTree, 15898, 52997, 51.75, 3.33
BTrees._IIBTree.IIBTree, 143806, 53443, 52.19, 0.37
BTrees._IOBTree.IOBTree, 24944, 68928, 67.31, 2.76
BTrees._OOBTree.OOBucket, 115356, 70806, 69.15, 0.61
BTrees._OIBTree.OIBucket, 81022, 76848, 75.05, 0.95
BTrees._IIBTree.IIBucket, 251985, 163777, 159.94, 0.65
BTrees._IIBTree.IISet, 582704, 519508, 507.33, 0.89
BTrees._IOBTree.IOBucket, 644640, 1010896, 987.2, 1.57

Total size: 2043.26 MB


No metadata:

Classname, Count, Size (kB), Size (mB), Avg (kB)

OFS.Folder.Folder, 1, 537, 0.52, 0.05
BTrees.Length.Length, 220095, 6381, 6.23, 0.03
Products.Archetypes.BaseUnit.BaseUnit, 3, 7504, 7.33, 0.25
Persistence.mapping.PersistentMapping, 2, 8242, 8.05, 0.41
Products.ATContentTypes.content.document.ATDocument, 1, 15057, 14.7,
1.51
BTrees._OIBTree.OIBTree, 4608, 22971, 22.43, 4.99
BTrees._IIBTree.IITreeSet, 47718, 25185, 24.59, 0.53
BTrees._OOBTree.OOBTree, 15882, 52566, 51.33, 3.31
BTrees._IIBTree.IIBTree, 143948, 53520, 52.27, 0.37
BTrees._IOBTree.IOBTree, 25604, 69710, 68.08, 2.72
BTrees._OOBTree.OOBucket, 115340, 70774, 69.12, 0.61
BTrees._OIBTree.OIBucket, 132412, 101655, 99.27, 0.77
BTrees._IIBTree.IIBucket, 252127, 164492, 160.64, 0.65
BTrees._IOBTree.IOBucket, 655008, 689299, 673.14, 1.05
BTrees._IIBTree.IISet, 642280, 1027442, 1003.36, 1.6

Total size: 226

Re: [ZODB-Dev] BTree pickle size

2008-08-24 Thread Roché Compaan
This is the fsdump output for a single IOBTree:

  data #00032 oid=1bac size=5435 class=BTrees._IOBTree.IOBTree

What is persisted as part of the 5435 bytes? References to containing
buckets? What else?


-- 
Roché Compaan
Upfront Systems   http://www.upfrontsystems.co.za

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  [email protected]
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] BTree pickle size

2008-08-23 Thread Roché Compaan
On Sun, 2008-08-24 at 08:08 +0200, [EMAIL PROTECTED] wrote:
> Tres Seaver wrote at 2008-8-22 16:45 -0400:
> > ...
> >I recall a pre-Zope (for me, 10 years ago) rule of thumb that text
> >indexing imposed an order of magnitude of overhead on the actual corpus,
> >with improvements possible only via batching or post-processing /
> >compresstion (incremental indexing is worst-case).
> 
> And this is especially true for indexes supporting a term frequency
> based ranking and which uses "IISet" at places where "IITreeSet" were
> more appropriate.
> 
> With "TextIndexNG3", one can get rid of the overhead of
> term frequency based ranking (in case one does not need it)
> 
> Using "AdvancedQuery" (and parsing the text subqueries oneself),
> one can use a "Managable SimpleTextIndex" which
> tries very hard to be as efficient as possible for large data sets
> (and does not support term frequency based ranking).

Thanks for the feedback. I'll re-run the tests without any text indexes,
as well as run it with other implementations such as TextIndexNG3 and
SimpleTextIndex and compare the results.

-- 
Roché Compaan
Upfront Systems   http://www.upfrontsystems.co.za

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  [email protected]
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] BTree pickle size

2008-08-23 Thread Roché Compaan
On Sun, 2008-08-24 at 08:13 +0200, Dieter Maurer wrote:
> Roché Compaan wrote at 2008-8-23 19:31 +0200:
> >On Sat, 2008-08-23 at 14:09 +0200, Dieter Maurer wrote:
> >> Roché Compaan wrote at 2008-8-22 14:49 +0200:
> >> >I've been doing some benchmarks on Plone and got some surprising stats
> >> >on the pickle size of btrees and their buckets that are persisted with
> >> >each transaction. Surprising in the sense that they are very big in
> >> >relation to the actual data indexed. I would appreciate it if somebody
> >> >can help me understand what is going on, or just take a look to see if
> >> >the sizes look normal.
> >> >
> >> >In the benchmark I add and index 1 ATDocuments. I commit after each
> >> >document to simulate a transaction per request environment. Each
> >> >document has a 100 byte long description and 100 bytes in it's body. The
> >> >total transaction size however is 40K in the beginning. The transaction
> >> >sizes grow linearly to about 350K when reaching 1 documents.
> >> 
> >> The "Bucket" nodes store usually between 22 ("OOBucket") and 90 
> >> ("IIBucket")
> >> objects in a single bucket.
> >> 
> >> With any change, the transaction will contain unmodified data
> >> for several dozens other objects.
> >
> >Are you saying *all* 22 OOBuckets and 90 IIBuckets will be persisted
> >again whether they are modified or not?
> 
> I did not speak of "22 OOBuckets" but of typically 22 entries in an
> "OOBucket" (similarly for "IIBucket").
> 
> And indeed, when a single entry in an "OOBucket" is changed, then all
> entries are rewritten even if the other entries did not change.
> 
> That is because the ZODB load/store granularity is the persistent object
> (without persistent subobjects). An "OOBucket" is a persistent object --
> it is loaded/stored always as a whole (all entries together).

Yes, that's how I understand it. I misunderstood your original
statement and thought that there is something else at play here.

-- 
Roché Compaan
Upfront Systems   http://www.upfrontsystems.co.za

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  [email protected]
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] BTree pickle size

2008-08-23 Thread Dieter Maurer
Dieter Maurer wrote at 2008-8-23 14:09 +0200:
> ...
>A typical "IISet" contains 90 value records and a persistent reference.
>
>I expect that an integer is pickled in 5 bytes. Thus, about 0.5 kB
>should be expected as typical size of an "IISet".
>Your "IISet" instances seem to be about 1.5 kB large.
>
>That is significantly larger than I would expect but maybe not
>yet something to worry about.

The larger than expected size probably results from a use of "IISet"
at a place where "IITreeSet" would have been better.



-- 
Dieter
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  [email protected]
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] BTree pickle size

2008-08-23 Thread Dieter Maurer
Jean Jordaan wrote at 2008-8-23 20:44 +0700:
>> That is significantly larger than I would expect but maybe not
>> yet something to worry about.
>
>[...]
>
>> Your "IIBuckets" are smaller than one would expect.
>
>These are plain ATDocuments, so either Plone's behaviour is unexpected
>or the measurement is off.

They are likely not yet as filled as one would expect them.



-- 
Dieter
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  [email protected]
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] BTree pickle size

2008-08-23 Thread Dieter Maurer
Roché Compaan wrote at 2008-8-23 19:31 +0200:
>On Sat, 2008-08-23 at 14:09 +0200, Dieter Maurer wrote:
>> Roché Compaan wrote at 2008-8-22 14:49 +0200:
>> >I've been doing some benchmarks on Plone and got some surprising stats
>> >on the pickle size of btrees and their buckets that are persisted with
>> >each transaction. Surprising in the sense that they are very big in
>> >relation to the actual data indexed. I would appreciate it if somebody
>> >can help me understand what is going on, or just take a look to see if
>> >the sizes look normal.
>> >
>> >In the benchmark I add and index 1 ATDocuments. I commit after each
>> >document to simulate a transaction per request environment. Each
>> >document has a 100 byte long description and 100 bytes in it's body. The
>> >total transaction size however is 40K in the beginning. The transaction
>> >sizes grow linearly to about 350K when reaching 1 documents.
>> 
>> The "Bucket" nodes store usually between 22 ("OOBucket") and 90 ("IIBucket")
>> objects in a single bucket.
>> 
>> With any change, the transaction will contain unmodified data
>> for several dozens other objects.
>
>Are you saying *all* 22 OOBuckets and 90 IIBuckets will be persisted
>again whether they are modified or not?

I did not speak of "22 OOBuckets" but of typically 22 entries in an
"OOBucket" (similarly for "IIBucket").

And indeed, when a single entry in an "OOBucket" is changed, then all
entries are rewritten even if the other entries did not change.

That is because the ZODB load/store granularity is the persistent object
(without persistent subobjects). An "OOBucket" is a persistent object --
it is loaded/stored always as a whole (all entries together).



-- 
Dieter
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  [email protected]
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] BTree pickle size

2008-08-23 Thread dieter
Tres Seaver wrote at 2008-8-22 16:45 -0400:
> ...
>I recall a pre-Zope (for me, 10 years ago) rule of thumb that text
>indexing imposed an order of magnitude of overhead on the actual corpus,
>with improvements possible only via batching or post-processing /
>compresstion (incremental indexing is worst-case).

And this is especially true for indexes supporting a term frequency
based ranking and which uses "IISet" at places where "IITreeSet" were
more appropriate.

With "TextIndexNG3", one can get rid of the overhead of
term frequency based ranking (in case one does not need it)

Using "AdvancedQuery" (and parsing the text subqueries oneself),
one can use a "Managable SimpleTextIndex" which
tries very hard to be as efficient as possible for large data sets
(and does not support term frequency based ranking).



-- 
Dieter
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  [email protected]
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] BTree pickle size

2008-08-23 Thread Roché Compaan
On Sat, 2008-08-23 at 19:31 +0200, Roché Compaan wrote:
> I am curious to know if you can explain why the proportion of actual
> to total transaction size is so small?

Sorry that sentence isn't clear, I meant to say "the proportion of
actual data on the document to the total transaction size".

-- 
Roché Compaan
Upfront Systems   http://www.upfrontsystems.co.za

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  [email protected]
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] BTree pickle size

2008-08-23 Thread Roché Compaan
On Sat, 2008-08-23 at 14:09 +0200, Dieter Maurer wrote:
> Roché Compaan wrote at 2008-8-22 14:49 +0200:
> >I've been doing some benchmarks on Plone and got some surprising stats
> >on the pickle size of btrees and their buckets that are persisted with
> >each transaction. Surprising in the sense that they are very big in
> >relation to the actual data indexed. I would appreciate it if somebody
> >can help me understand what is going on, or just take a look to see if
> >the sizes look normal.
> >
> >In the benchmark I add and index 1 ATDocuments. I commit after each
> >document to simulate a transaction per request environment. Each
> >document has a 100 byte long description and 100 bytes in it's body. The
> >total transaction size however is 40K in the beginning. The transaction
> >sizes grow linearly to about 350K when reaching 1 documents.
> 
> The "Bucket" nodes store usually between 22 ("OOBucket") and 90 ("IIBucket")
> objects in a single bucket.
> 
> With any change, the transaction will contain unmodified data
> for several dozens other objects.

Are you saying *all* 22 OOBuckets and 90 IIBuckets will be persisted
again whether they are modified or not?

> 
> >What concerns me is that the footprint of indexed data in terms of
> >BTrees, Buckets and Sets are huge! The total amount of data committed
> >that related directly to ATDocument is around 30 Mbyte. The total for
> >BTrees, Buckets and IISets is more than 2 Gbyte. Even taking into
> >account that Plone has a lot of catalog indexes and metadata columns (I
> >think 71 in total), this seems very high. 
> >
> >This is a summary of total data committed per class:
> >
> >Classname,Object Count,Total Size (Kbytes)
> >BTrees._IIBTree.IISet,640686,1024506
> 
> A typical "IISet" contains 90 value records and a persistent reference.
> 
> I expect that an integer is pickled in 5 bytes. Thus, about 0.5 kB
> should be expected as typical size of an "IISet".
> Your "IISet" instances seem to be about 1.5 kB large.
> 
> That is significantly larger than I would expect but maybe not
> yet something to worry about.

It looks like there is something to be worried about since there are
quite a few IISet instances that are larger than 0.5 kB. Some are as
large as 50K! Here are some lines from fsdump:

  data #00033 oid=1d65 size=50058
class=BTrees._IIBTree.IISet
  data #00034 oid=1d66 size=50058
class=BTrees._IIBTree.IISet
  data #00111 oid=1e0b size=50023
class=BTrees._IIBTree.IISet
  data #00033 oid=1d65 size=50063
class=BTrees._IIBTree.IISet
  data #00034 oid=1d66 size=50063
class=BTrees._IIBTree.IISet
  data #00109 oid=1e0b size=50028
class=BTrees._IIBTree.IISet
  data #00035 oid=1d65 size=50068
class=BTrees._IIBTree.IISet


> >BTrees._IIBTree.IIBucket,252121,163524
> 
> The same size reasoning applies to "IIBucket"s: 90 records, but
> now consisting of key and value (about 10 bytes).
> 
> Your "IIBuckets" are smaller than one would expect.

But that is supposedly ok?

I am curious to know if you can explain why the proportion of actual to
total transaction size is so small?

-- 
Roché Compaan
Upfront Systems   http://www.upfrontsystems.co.za

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  [email protected]
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] BTree pickle size

2008-08-23 Thread Jean Jordaan
> That is significantly larger than I would expect but maybe not
> yet something to worry about.

[...]

> Your "IIBuckets" are smaller than one would expect.

These are plain ATDocuments, so either Plone's behaviour is unexpected
or the measurement is off.

-- 
jean . ..  //\\\oo///\\
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  [email protected]
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] BTree pickle size

2008-08-23 Thread Dieter Maurer
Roché Compaan wrote at 2008-8-22 14:49 +0200:
>I've been doing some benchmarks on Plone and got some surprising stats
>on the pickle size of btrees and their buckets that are persisted with
>each transaction. Surprising in the sense that they are very big in
>relation to the actual data indexed. I would appreciate it if somebody
>can help me understand what is going on, or just take a look to see if
>the sizes look normal.
>
>In the benchmark I add and index 1 ATDocuments. I commit after each
>document to simulate a transaction per request environment. Each
>document has a 100 byte long description and 100 bytes in it's body. The
>total transaction size however is 40K in the beginning. The transaction
>sizes grow linearly to about 350K when reaching 1 documents.

The "Bucket" nodes store usually between 22 ("OOBucket") and 90 ("IIBucket")
objects in a single bucket.

With any change, the transaction will contain unmodified data
for several dozens other objects.

>What concerns me is that the footprint of indexed data in terms of
>BTrees, Buckets and Sets are huge! The total amount of data committed
>that related directly to ATDocument is around 30 Mbyte. The total for
>BTrees, Buckets and IISets is more than 2 Gbyte. Even taking into
>account that Plone has a lot of catalog indexes and metadata columns (I
>think 71 in total), this seems very high. 
>
>This is a summary of total data committed per class:
>
>Classname,Object Count,Total Size (Kbytes)
>BTrees._IIBTree.IISet,640686,1024506

A typical "IISet" contains 90 value records and a persistent reference.

I expect that an integer is pickled in 5 bytes. Thus, about 0.5 kB
should be expected as typical size of an "IISet".
Your "IISet" instances seem to be about 1.5 kB large.

That is significantly larger than I would expect but maybe not
yet something to worry about.


> ...
>BTrees._IIBTree.IIBucket,252121,163524

The same size reasoning applies to "IIBucket"s: 90 records, but
now consisting of key and value (about 10 bytes).

Your "IIBuckets" are smaller than one would expect.



-- 
Dieter
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  [email protected]
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] BTree pickle size

2008-08-22 Thread Tres Seaver
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Roché Compaan wrote:
> On Fri, 2008-08-22 at 16:37 -0300, Sidnei da Silva wrote:
>> On Fri, Aug 22, 2008 at 9:49 AM, Roché Compaan
>> <[EMAIL PROTECTED]> wrot> Transaction detail for txn #00099
>> (first document):
>>> Txn id,Classname,Object count,Size (bytes)
>>> #00099,BTrees._IIBTree.IIBTree,3,286
>>> #00099,OFS.Folder.Folder,1,55
>>> #00099,BTrees._IOBTree.IOBucket,9,4572
>>> #00099,BTrees._OIBTree.OIBucket,5,2964
>>> #00099,BTrees._IOBTree.IOBTree,39,17552
>>> #00099,BTrees.Length.Length,27,768
>>> #00099,Persistence.mapping.PersistentMapping,2,846
>>> #00099,Products.ATContentTypes.content.document.ATDocument,1,1544
>>> #00099,BTrees._OOBTree.OOBTree,20,3986
>>> #00099,BTrees._IIBTree.IISet,3,184
>>> #00099,BTrees._OIBTree.OIBTree,9,1404
>>> #00099,Products.Archetypes.BaseUnit.BaseUnit,3,767
>>> #00099,BTrees._OOBTree.OOBucket,2,3286
>>> #00099,BTrees._IIBTree.IITreeSet,55,3905
>>>
>>> ?Transaction detail for txn #10099 (last document):
>>>
>>> Txn id,Classname,Object count,Size (bytes)
>>> #10099,BTrees._IIBTree.IIBTree,8,2517
>>> #10099,OFS.Folder.Folder,1,55
>>> #10099,BTrees._IOBTree.IOBucket,57,81564
>>> #10099,BTrees._OIBTree.OIBucket,13,9872
>>> #10099,BTrees._IIBTree.IIBucket,29,20024
>>> #10099,BTrees._IOBTree.IOBTree,1,85
>>> #10099,Persistence.mapping.PersistentMapping,2,846
>>> #10099,BTrees.Length.Length,22,655
>>> #10099,Products.ATContentTypes.content.document.ATDocument,1,1544
>>> #10099,BTrees._OOBTree.OOBTree,6,30455
>>> #10099,BTrees._IIBTree.IISet,65,182708
>>> #10099,Products.Archetypes.BaseUnit.BaseUnit,3,767
>>> #10099,BTrees._OOBTree.OOBucket,16,8088
>>> #10099,BTrees._IIBTree.IITreeSet,2,122
>> It's pretty clear that the difference here is the IISet(65 vs 3) and
>> the IOBucket(57 vs 9). The rest looks pretty much stable. Now, if I
>> understand correctly that means the last document caused 57 IOBuckets
>> to be modified, but not necessarily created.
> 
> Right. But even looking at the very first transaction the indexing
> overhead is visible: 3 Kbytes of data related to the document (ATDoc,
> BaseUnit, PersistentMapping) is only a fraction of the total transaction
> size of 40 Kbytes.

I recall a pre-Zope (for me, 10 years ago) rule of thumb that text
indexing imposed an order of magnitude of overhead on the actual corpus,
with improvements possible only via batching or post-processing /
compresstion (incremental indexing is worst-case).


Tres.
- --
===
Tres Seaver  +1 540-429-0999  [EMAIL PROTECTED]
Palladion Software   "Excellence by Design"http://palladion.com
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIryVR+gerLs4ltQ4RAv7CAKC68bT3zmp5P1xOpxCX+TpoVg/qJACcC1rv
5oQeHxjFc3iCkJz8o09awP0=
=wYKj
-END PGP SIGNATURE-

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  [email protected]
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] BTree pickle size

2008-08-22 Thread Roché Compaan
On Fri, 2008-08-22 at 16:37 -0300, Sidnei da Silva wrote:
> On Fri, Aug 22, 2008 at 9:49 AM, Roché Compaan
> <[EMAIL PROTECTED]> wrot> Transaction detail for txn #00099
> (first document):
> >
> > Txn id,Classname,Object count,Size (bytes)
> > #00099,BTrees._IIBTree.IIBTree,3,286
> > #00099,OFS.Folder.Folder,1,55
> > #00099,BTrees._IOBTree.IOBucket,9,4572
> > #00099,BTrees._OIBTree.OIBucket,5,2964
> > #00099,BTrees._IOBTree.IOBTree,39,17552
> > #00099,BTrees.Length.Length,27,768
> > #00099,Persistence.mapping.PersistentMapping,2,846
> > #00099,Products.ATContentTypes.content.document.ATDocument,1,1544
> > #00099,BTrees._OOBTree.OOBTree,20,3986
> > #00099,BTrees._IIBTree.IISet,3,184
> > #00099,BTrees._OIBTree.OIBTree,9,1404
> > #00099,Products.Archetypes.BaseUnit.BaseUnit,3,767
> > #00099,BTrees._OOBTree.OOBucket,2,3286
> > #00099,BTrees._IIBTree.IITreeSet,55,3905
> >
> > ?Transaction detail for txn #10099 (last document):
> >
> > Txn id,Classname,Object count,Size (bytes)
> > #10099,BTrees._IIBTree.IIBTree,8,2517
> > #10099,OFS.Folder.Folder,1,55
> > #10099,BTrees._IOBTree.IOBucket,57,81564
> > #10099,BTrees._OIBTree.OIBucket,13,9872
> > #10099,BTrees._IIBTree.IIBucket,29,20024
> > #10099,BTrees._IOBTree.IOBTree,1,85
> > #10099,Persistence.mapping.PersistentMapping,2,846
> > #10099,BTrees.Length.Length,22,655
> > #10099,Products.ATContentTypes.content.document.ATDocument,1,1544
> > #10099,BTrees._OOBTree.OOBTree,6,30455
> > #10099,BTrees._IIBTree.IISet,65,182708
> > #10099,Products.Archetypes.BaseUnit.BaseUnit,3,767
> > #10099,BTrees._OOBTree.OOBucket,16,8088
> > #10099,BTrees._IIBTree.IITreeSet,2,122
> 
> It's pretty clear that the difference here is the IISet(65 vs 3) and
> the IOBucket(57 vs 9). The rest looks pretty much stable. Now, if I
> understand correctly that means the last document caused 57 IOBuckets
> to be modified, but not necessarily created.

Right. But even looking at the very first transaction the indexing
overhead is visible: 3 Kbytes of data related to the document (ATDoc,
BaseUnit, PersistentMapping) is only a fraction of the total transaction
size of 40 Kbytes.

> I wonder if you used QueueCatalog, and if don't, what would it look
> like if you did.
> 

I didn't use QueueCatalog, but if I did, I don't think it would have
made any difference in the size of the objects since the only difference
is that indexing is delayed. It could however make a big difference in
the total size of the Data.fs in that fewer revisions of set and bucket
instances would be persisted.

-- 
Roché Compaan
Upfront Systems   http://www.upfrontsystems.co.za

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  [email protected]
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] BTree pickle size

2008-08-22 Thread Sidnei da Silva
On Fri, Aug 22, 2008 at 9:49 AM, Roché Compaan
<[EMAIL PROTECTED]> wrot> Transaction detail for txn #00099
(first document):
>
> Txn id,Classname,Object count,Size (bytes)
> #00099,BTrees._IIBTree.IIBTree,3,286
> #00099,OFS.Folder.Folder,1,55
> #00099,BTrees._IOBTree.IOBucket,9,4572
> #00099,BTrees._OIBTree.OIBucket,5,2964
> #00099,BTrees._IOBTree.IOBTree,39,17552
> #00099,BTrees.Length.Length,27,768
> #00099,Persistence.mapping.PersistentMapping,2,846
> #00099,Products.ATContentTypes.content.document.ATDocument,1,1544
> #00099,BTrees._OOBTree.OOBTree,20,3986
> #00099,BTrees._IIBTree.IISet,3,184
> #00099,BTrees._OIBTree.OIBTree,9,1404
> #00099,Products.Archetypes.BaseUnit.BaseUnit,3,767
> #00099,BTrees._OOBTree.OOBucket,2,3286
> #00099,BTrees._IIBTree.IITreeSet,55,3905
>
> ?Transaction detail for txn #10099 (last document):
>
> Txn id,Classname,Object count,Size (bytes)
> #10099,BTrees._IIBTree.IIBTree,8,2517
> #10099,OFS.Folder.Folder,1,55
> #10099,BTrees._IOBTree.IOBucket,57,81564
> #10099,BTrees._OIBTree.OIBucket,13,9872
> #10099,BTrees._IIBTree.IIBucket,29,20024
> #10099,BTrees._IOBTree.IOBTree,1,85
> #10099,Persistence.mapping.PersistentMapping,2,846
> #10099,BTrees.Length.Length,22,655
> #10099,Products.ATContentTypes.content.document.ATDocument,1,1544
> #10099,BTrees._OOBTree.OOBTree,6,30455
> #10099,BTrees._IIBTree.IISet,65,182708
> #10099,Products.Archetypes.BaseUnit.BaseUnit,3,767
> #10099,BTrees._OOBTree.OOBucket,16,8088
> #10099,BTrees._IIBTree.IITreeSet,2,122

It's pretty clear that the difference here is the IISet(65 vs 3) and
the IOBucket(57 vs 9). The rest looks pretty much stable. Now, if I
understand correctly that means the last document caused 57 IOBuckets
to be modified, but not necessarily created.

I wonder if you used QueueCatalog, and if don't, what would it look
like if you did.

-- 
Sidnei da Silva
Enfold Systems http://enfoldsystems.com
Fax +1 832 201 8856 Office +1 713 942 2377 Ext 214
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  [email protected]
http://mail.zope.org/mailman/listinfo/zodb-dev