Re: [ZODB-Dev] Advice needed

2006-06-25 Thread Dieter Maurer
Roché Compaan wrote at 2006-6-25 11:13 +0200:
 ...
 Careful design with respect to the granularity and locality
 of persistent objects:
 
   Move groups of large and rarely used attributes out
   into persistent subobjects.

Will this lead to smaller transaction sizes for objects that store large
attributes as persistent subobjects?

It can mean fewer large transactions sizes as you get a large
transaction only when you modify the large attribute and
not when you do an innocent modification (on a small attribute).

It can also considerably reduce load time as you fetch
the subobject with the large attributes only when you need this
large content (and not when you happen to access the object
for other unrelated reasons). You memory consumption can go down
as well.

 ...
This is interesting! ZCatalog metadata is stored in a IOBTree. Are you
saying the values in the IOBTree should be wrapped in a class that
subclasses Persistent, and not stored as raw values.

Yes -- at least as soon as the value gets larger.

The ZCatalog and indexing in general seems to be the biggest problem
with ZODB applications.

That's because they have to handle huge amounts of data and objects.

BTrees do not seem to fix the problem.

If you speak about conflicts: they reduce the conflict probability
by about 30 to 80 times (depending on type). That's not too bad...

 ...
Can somebody describe a ZODB indexing implementation

There is no ZODB indexing implementation. There is a ZCatalog one.

where an
application can index new objects concurrently? Given the above, this is
probably not a stupid question.

We won't be able to ban conflict errors from the ZODB -- not even
when we consider only indexing applications.


I thought that it might be possible to extend the BTrees conflict
resolution which now it restricted to the leaf level to
one level up. This would reduce the conflict probability again
by a factor of 50 to 100. However, conflict resolution is a *very*
tricky task and it is quite easy to err and to introduce nasty highly
non-deterministic bugs. I haven't dared to attack this problem...



-- 
Dieter
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Advice needed

2006-06-25 Thread Andreas Jung



--On 25. Juni 2006 21:33:21 +0200 Dieter Maurer [EMAIL PROTECTED] wrote:


Andreas Jung wrote at 2006-6-24 22:36 +0200:

...

   The ZODB model (object data stored in a storage with behaviour
   coded in the clients) is powerful enough to simply
   implement the relational database data structures: tables
   and indexes and their corresponding operations.

   After this implementation, you could perform the same
   complex queries against the records maintained in
   these data structures.


Of course you *could*  do it but you can also shot yourself with a gun
into  your knee.


But, these two things have nothing in common ;-)


Of course they have! Normal ppl should/would use the tools that are 
available and that solves their problems. You know much better than you can 
solve any problem with any tools/language but implementing such a 
functionality is likely beyond the horizon of most developers (I know 
you're far beyond this horizon)...so I talking usually about the average 
programmer/developer.




And in fact, I plan to do the former in some future times
(but doubt that I will ever shoot myself into my knee).


I doubt that you have a gun :-)

-aj

pgpnq9ZD1sHb3.pgp
Description: PGP signature
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Advice needed

2006-06-24 Thread Roché Compaan
On Fri, 2006-06-23 at 22:03 +0200, Dieter Maurer wrote:
 Adam Groszer wrote at 2006-6-23 14:27 +0200:
  ...
 Some fears they are having and I can't find unambiguous information:
 - Is ZODB a good choice for this app?
 
 It depends...
 At least careful design is necessary!
 
 The most problematic aspects of the ZODB are write conflicts.
 
When two concurrent transactions modify the same object
a write conflict will occur unless the object provides
application specific conflict resolution (which might
resolve some conflicts).
 
This behaviour is especially serious when you have
expensive (long running) transactions. The longer
the transaction runs the higher is the risk that
it interferes with another transaction and
the higher is the cost of the resulting conflict.
 
This means that you must carefully design your system
to reduce the risk of write conflicts.
 
You can (and should!) for example use workflow to
prevent that the same application object (read document in your
case) can be modified by concurrent transactions.
You are interested in this for other reasons as well
(you do not want to wipe out the work of a colleague by overwriting
his changes).
This can clear this side (application objects) of the front.
 
However, there are also global objects which can be
modified concurrently. The most prominent example:
catalog data structures (speak indexes).
They do use application specific conflict resolution -- but
it is often not good enough...
For the catalog, you can move out indexing operations to
a separate thread (done e.g. by the QueueCatalog Zope product).
This considerably reduces write conflicts at the cost that
indexing operations are no longer inline but lack a bit behind.

I am curious what other strategies besides QueueCatalog you employ? Do
you ever use multiple backends for your apps? How do you decide that
this data belongs in a relational backend? How structured must the data
be, or how many records must be written how often?

I find most data is highly structured (fixed schema), but this doesn't
make me choose an RDMBS - the frequency of writes, concurrency and
record volume does.

-- 
Roché Compaan
Upfront Systems   http://www.upfrontsystems.co.za

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Advice needed

2006-06-24 Thread Roché Compaan
On Sat, 2006-06-24 at 09:24 +0200, Andreas Jung wrote:
 
 --On 24. Juni 2006 08:53:43 +0200 Roché Compaan 
 [EMAIL PROTECTED] wrote:
 
  I am curious what other strategies besides QueueCatalog you employ? Do
  you ever use multiple backends for your apps? How do you decide that
  this data belongs in a relational backend? How structured must the data
  be, or how many records must be written how often?
 
  I find most data is highly structured (fixed schema), but this doesn't
  make me choose an RDMBS - the frequency of writes, concurrency and
  record volume does.
 
 It is often the case where you have more complex data models that really 
 require a RDBMS. It is often the case that you have to perform complex 
 queries on your data. The ZCatalog is often just too weak and too slow for 
 such apps. Our CMS (to which Dieter often refers to) uses the ZODB to store 
 large amounts of SGML data. Since the documents fits perfectly inside a 
 hierarchy the ZODB is the perfect choice. However we have some apps around 
 the CMS that provide additional functionality or use the CMS for a 
 different purpose. Some of these apps use the content from the ZODB but 
 store their metadata in postgres. One not unimportant advantage is that 
 other ppl can run their own reports etc. using a postgres client without 
 approaching me to write a script or something in Python to get requested 
 data out of Zope.
 One particular app that I have been working on uses very complex queries 
 with lots of join etcit would be hard model to implement such queries 
 on top of the ZODB/ZCatalog. Another point is performance: this app often 
 has to perform a lot of insert/update/delete operations within one 
 transaction (up to 1000 modifications). Postgres takes perhaps 5 seconds 
 for such a complex operation. You will never reach that performance with 
 Zope...touching 1000 objects and reindexing them will take much longer 
 (unless you adjust your data modell for performance reasons to the ZODB
 needs).

Thanks for sharing this example, it confirms the usefulness of a hybrid
backend. In a way it's a sanity check for me.

 So what I want to say is: the bullheaded idea to stick everything into the 
 ZODB or into a RDBMS is just stupid. Smart people think about the data 
 storage before starting a project.

I agree wholeheartedly that it is stupid to put everything in one
backend if the data model requires a hybrid backend.

  That also reminds me of some postings on 
 the Plone mailinglist where non-technical people put all their stuff inside 
 without knowing what's happening under the hood and wonder that everything 
 explodes or becomes horribly slow at some point.

I think the difference here is that technical people want a fundamental
understanding of what is going on under the hood so that they can
explain better what they already observe.

-- 
Roché Compaan
Upfront Systems   http://www.upfrontsystems.co.za

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Advice needed

2006-06-24 Thread Dieter Maurer
Roché Compaan wrote at 2006-6-24 08:53 +0200:
 ...
I am curious what other strategies besides QueueCatalog you employ?

Careful design with respect to the granularity and locality
of persistent objects:

  Move groups of large and rarely used attributes out
  into persistent subobjects.

  Move groups of small and often modified attributes out
  into persistent subobjects.

  If some attributes are usually used together, keep
  them in the same persistent subobject.

  Don't group large randomly accessed mass data into the same
  persistent object (as this stupid ZCatalog metadata implementation
  does). Use persistent wrappers instead.

Do
you ever use multiple backends for your apps?

I have the tendency to keep everything in the ZODB -- but
I have colleagues fond of relational databases which do not let me...

Anyway: not even me would try to put the telephone book for Berlin
or even New York into the ZODB but would use a relational database...

How do you decide that
this data belongs in a relational backend? How structured must the data
be, or how many records must be written how often?

I do not have a rigid criteria catalog...

Relational databases handle records with simple field types well.
Their world becomes quickly nasty when the fields are not
of simple type but structured or lists/sets or objects with interesting
behaviour (rather than pure data).

As I explained in a former message, relational databases are
also more efficient with searching (they filter on the server
rather than the client). Thus, the need for efficient searches
(over the datatypes supported by relational databases) indicates
the use of such a beast.

On the other hand, I would not put things into a relational database,
they cannot do anything with: e.g. blobs, complex data structures.
I would use the ZODB for this.

In between there is a large grey area where I tend to use the ZODB
and not the relational database.

I find most data is highly structured (fixed schema), but this doesn't
make me choose an RDMBS - the frequency of writes, concurrency and
record volume does.

Sure, you are right!

As explained, relational databases exploit the highly structured
property while the ZODB ignores it. This may not be relevant, though,
if your application does not perform mass operations on them.



-- 
Dieter
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Advice needed

2006-06-24 Thread Andreas Jung



--On 24. Juni 2006 21:08:03 +0200 Dieter Maurer [EMAIL PROTECTED] wrote:


Andreas Jung wrote at 2006-6-24 09:24 +0200:

...
One particular app that I have been working on uses very complex queries
with lots of join etcit would be hard model to implement such
queries  on top of the ZODB/ZCatalog.


I disagree with you (partially).

   The ZODB model (object data stored in a storage with behaviour
   coded in the clients) is powerful enough to simply
   implement the relational database data structures: tables
   and indexes and their corresponding operations.

   After this implementation, you could perform the same
   complex queries against the records maintained in
   these data structures.


Of course you *could*  do it but you can also shot yourself with a gun into 
your knee. You could also implement Zope on top of a touring machine 

but nobody would do that :-)


Another point is performance: this app often
has to perform a lot of insert/update/delete operations within one
transaction (up to 1000 modifications).


You would not believe how many objects your TextIndexNG modifies
when it indexes a single document. Thousands is not very much ;-)



Patches are welcome :-)


-aj

pgpAvUKwzxV7D.pgp
Description: PGP signature
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Advice needed

2006-06-23 Thread Dieter Maurer
Adam Groszer wrote at 2006-6-23 14:27 +0200:
 ...
Some fears they are having and I can't find unambiguous information:
- Is ZODB a good choice for this app?

It depends...
At least careful design is necessary!

The most problematic aspects of the ZODB are write conflicts.

   When two concurrent transactions modify the same object
   a write conflict will occur unless the object provides
   application specific conflict resolution (which might
   resolve some conflicts).

   This behaviour is especially serious when you have
   expensive (long running) transactions. The longer
   the transaction runs the higher is the risk that
   it interferes with another transaction and
   the higher is the cost of the resulting conflict.

   This means that you must carefully design your system
   to reduce the risk of write conflicts.

   You can (and should!) for example use workflow to
   prevent that the same application object (read document in your
   case) can be modified by concurrent transactions.
   You are interested in this for other reasons as well
   (you do not want to wipe out the work of a colleague by overwriting
   his changes).
   This can clear this side (application objects) of the front.

   However, there are also global objects which can be
   modified concurrently. The most prominent example:
   catalog data structures (speak indexes).
   They do use application specific conflict resolution -- but
   it is often not good enough...
   For the catalog, you can move out indexing operations to
   a separate thread (done e.g. by the QueueCatalog Zope product).
   This considerably reduces write conflicts at the cost that
   indexing operations are no longer inline but lack a bit behind.

   If you implement other global objects with high write probability,
   you need to either implement appliciation specific conflict
   resolution (which is not always easy) or carefully reduce
   the conflict probability (e.g. by relaying the changes to a separate
   thread).
   

Other aspects you should care about:

  *  how to make backups for your system

 FileStorage tends to produce a few huge files
 which are difficult to backup (and restore) with standard
 means (standard incremental backup will not work).

 There is repozo to get efficient (non-standard) incremental
 backup.

  *  if possible, partition your data and put each partition
 in its own storage.

 Your partitions should be self contained (in order to move then
 around and backup/restore them individually)

  *  (FileStorage) startup time can be proportional to the storage size
 (when there is not up to date index file, e.g. due to an abnormal
 shutdown)

  *  (FileStorage) RAM linear to the number of objects is needed
 (to maintain the map oid -- fileposition).

 That is probably not yet a concern for a few 100.000 objects.
 It will get one when you get a few 100.000.000 objects...


Which storage to consider?
Filestorage?
maybe PGStorage?

I think the only trustable storages are FileStorage and
DirectoryStorage (requires file systems optimized for
huge directories, e.g. ReiserFS).

- ACID properties. Is it really ACID, I mean data consistency level
could be compared to a RDB?

You know that ACID is not ACID -- and that even most relational
databases are not truely ACID.

The ZODB does not guarantee the sequential transaction isolation
model: which means that the execution of any set of transactions
is equivalent to some sequential execution of these transactions.

A realistic example (which caused a bug in Zope's catalog indexes)
where this fails looks like this:

  Transaction 1:
 deletes a document d from an index document list index[term]
 and deletes index[term] when the list becomes empty

 if len(index[term]) == 1: del index[term]
 else: del index[term]

  Transactin 2:
 adds a document x to an index document list index[term]
 and creates the list, if necessary.

 if term in index: index[term] = DocumentList()
 index[term].insert(x)

If these transactions are executed concurrently, the x may get lost;
for example if the document list contains just d.

   In this case, transaction 1 will delete index[term] (because
   it does not yet see the effect of transaction 2).
   Transaction 2 will add the x to the old document list
   which is no longer used as soon as transaction 1 commits.

   Note that no sequential execution of T1 and T2 can have this
   result.

Note that I had to work a bit to come up with the example above.
The more straight forward implementation:

  Transaction 1:

 index[term].remove(d)
 if not index[term]: del index[term]

would *not* have this problem -- because both transactions
try to modify the same object (index[term]) which is
recognized and prevented by the ZODB (the former bug
in Zope indexing implementation resulted from the
application specific conflict resolution which wrongly claimed
to have