Re: [ZODB-Dev] Advice needed
Roché Compaan wrote at 2006-6-25 11:13 +0200: ... Careful design with respect to the granularity and locality of persistent objects: Move groups of large and rarely used attributes out into persistent subobjects. Will this lead to smaller transaction sizes for objects that store large attributes as persistent subobjects? It can mean fewer large transactions sizes as you get a large transaction only when you modify the large attribute and not when you do an innocent modification (on a small attribute). It can also considerably reduce load time as you fetch the subobject with the large attributes only when you need this large content (and not when you happen to access the object for other unrelated reasons). You memory consumption can go down as well. ... This is interesting! ZCatalog metadata is stored in a IOBTree. Are you saying the values in the IOBTree should be wrapped in a class that subclasses Persistent, and not stored as raw values. Yes -- at least as soon as the value gets larger. The ZCatalog and indexing in general seems to be the biggest problem with ZODB applications. That's because they have to handle huge amounts of data and objects. BTrees do not seem to fix the problem. If you speak about conflicts: they reduce the conflict probability by about 30 to 80 times (depending on type). That's not too bad... ... Can somebody describe a ZODB indexing implementation There is no ZODB indexing implementation. There is a ZCatalog one. where an application can index new objects concurrently? Given the above, this is probably not a stupid question. We won't be able to ban conflict errors from the ZODB -- not even when we consider only indexing applications. I thought that it might be possible to extend the BTrees conflict resolution which now it restricted to the leaf level to one level up. This would reduce the conflict probability again by a factor of 50 to 100. However, conflict resolution is a *very* tricky task and it is quite easy to err and to introduce nasty highly non-deterministic bugs. I haven't dared to attack this problem... -- Dieter ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Advice needed
--On 25. Juni 2006 21:33:21 +0200 Dieter Maurer [EMAIL PROTECTED] wrote: Andreas Jung wrote at 2006-6-24 22:36 +0200: ... The ZODB model (object data stored in a storage with behaviour coded in the clients) is powerful enough to simply implement the relational database data structures: tables and indexes and their corresponding operations. After this implementation, you could perform the same complex queries against the records maintained in these data structures. Of course you *could* do it but you can also shot yourself with a gun into your knee. But, these two things have nothing in common ;-) Of course they have! Normal ppl should/would use the tools that are available and that solves their problems. You know much better than you can solve any problem with any tools/language but implementing such a functionality is likely beyond the horizon of most developers (I know you're far beyond this horizon)...so I talking usually about the average programmer/developer. And in fact, I plan to do the former in some future times (but doubt that I will ever shoot myself into my knee). I doubt that you have a gun :-) -aj pgpnq9ZD1sHb3.pgp Description: PGP signature ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Advice needed
On Fri, 2006-06-23 at 22:03 +0200, Dieter Maurer wrote: Adam Groszer wrote at 2006-6-23 14:27 +0200: ... Some fears they are having and I can't find unambiguous information: - Is ZODB a good choice for this app? It depends... At least careful design is necessary! The most problematic aspects of the ZODB are write conflicts. When two concurrent transactions modify the same object a write conflict will occur unless the object provides application specific conflict resolution (which might resolve some conflicts). This behaviour is especially serious when you have expensive (long running) transactions. The longer the transaction runs the higher is the risk that it interferes with another transaction and the higher is the cost of the resulting conflict. This means that you must carefully design your system to reduce the risk of write conflicts. You can (and should!) for example use workflow to prevent that the same application object (read document in your case) can be modified by concurrent transactions. You are interested in this for other reasons as well (you do not want to wipe out the work of a colleague by overwriting his changes). This can clear this side (application objects) of the front. However, there are also global objects which can be modified concurrently. The most prominent example: catalog data structures (speak indexes). They do use application specific conflict resolution -- but it is often not good enough... For the catalog, you can move out indexing operations to a separate thread (done e.g. by the QueueCatalog Zope product). This considerably reduces write conflicts at the cost that indexing operations are no longer inline but lack a bit behind. I am curious what other strategies besides QueueCatalog you employ? Do you ever use multiple backends for your apps? How do you decide that this data belongs in a relational backend? How structured must the data be, or how many records must be written how often? I find most data is highly structured (fixed schema), but this doesn't make me choose an RDMBS - the frequency of writes, concurrency and record volume does. -- Roché Compaan Upfront Systems http://www.upfrontsystems.co.za ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Advice needed
On Sat, 2006-06-24 at 09:24 +0200, Andreas Jung wrote: --On 24. Juni 2006 08:53:43 +0200 Roché Compaan [EMAIL PROTECTED] wrote: I am curious what other strategies besides QueueCatalog you employ? Do you ever use multiple backends for your apps? How do you decide that this data belongs in a relational backend? How structured must the data be, or how many records must be written how often? I find most data is highly structured (fixed schema), but this doesn't make me choose an RDMBS - the frequency of writes, concurrency and record volume does. It is often the case where you have more complex data models that really require a RDBMS. It is often the case that you have to perform complex queries on your data. The ZCatalog is often just too weak and too slow for such apps. Our CMS (to which Dieter often refers to) uses the ZODB to store large amounts of SGML data. Since the documents fits perfectly inside a hierarchy the ZODB is the perfect choice. However we have some apps around the CMS that provide additional functionality or use the CMS for a different purpose. Some of these apps use the content from the ZODB but store their metadata in postgres. One not unimportant advantage is that other ppl can run their own reports etc. using a postgres client without approaching me to write a script or something in Python to get requested data out of Zope. One particular app that I have been working on uses very complex queries with lots of join etcit would be hard model to implement such queries on top of the ZODB/ZCatalog. Another point is performance: this app often has to perform a lot of insert/update/delete operations within one transaction (up to 1000 modifications). Postgres takes perhaps 5 seconds for such a complex operation. You will never reach that performance with Zope...touching 1000 objects and reindexing them will take much longer (unless you adjust your data modell for performance reasons to the ZODB needs). Thanks for sharing this example, it confirms the usefulness of a hybrid backend. In a way it's a sanity check for me. So what I want to say is: the bullheaded idea to stick everything into the ZODB or into a RDBMS is just stupid. Smart people think about the data storage before starting a project. I agree wholeheartedly that it is stupid to put everything in one backend if the data model requires a hybrid backend. That also reminds me of some postings on the Plone mailinglist where non-technical people put all their stuff inside without knowing what's happening under the hood and wonder that everything explodes or becomes horribly slow at some point. I think the difference here is that technical people want a fundamental understanding of what is going on under the hood so that they can explain better what they already observe. -- Roché Compaan Upfront Systems http://www.upfrontsystems.co.za ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Advice needed
Roché Compaan wrote at 2006-6-24 08:53 +0200: ... I am curious what other strategies besides QueueCatalog you employ? Careful design with respect to the granularity and locality of persistent objects: Move groups of large and rarely used attributes out into persistent subobjects. Move groups of small and often modified attributes out into persistent subobjects. If some attributes are usually used together, keep them in the same persistent subobject. Don't group large randomly accessed mass data into the same persistent object (as this stupid ZCatalog metadata implementation does). Use persistent wrappers instead. Do you ever use multiple backends for your apps? I have the tendency to keep everything in the ZODB -- but I have colleagues fond of relational databases which do not let me... Anyway: not even me would try to put the telephone book for Berlin or even New York into the ZODB but would use a relational database... How do you decide that this data belongs in a relational backend? How structured must the data be, or how many records must be written how often? I do not have a rigid criteria catalog... Relational databases handle records with simple field types well. Their world becomes quickly nasty when the fields are not of simple type but structured or lists/sets or objects with interesting behaviour (rather than pure data). As I explained in a former message, relational databases are also more efficient with searching (they filter on the server rather than the client). Thus, the need for efficient searches (over the datatypes supported by relational databases) indicates the use of such a beast. On the other hand, I would not put things into a relational database, they cannot do anything with: e.g. blobs, complex data structures. I would use the ZODB for this. In between there is a large grey area where I tend to use the ZODB and not the relational database. I find most data is highly structured (fixed schema), but this doesn't make me choose an RDMBS - the frequency of writes, concurrency and record volume does. Sure, you are right! As explained, relational databases exploit the highly structured property while the ZODB ignores it. This may not be relevant, though, if your application does not perform mass operations on them. -- Dieter ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Advice needed
--On 24. Juni 2006 21:08:03 +0200 Dieter Maurer [EMAIL PROTECTED] wrote: Andreas Jung wrote at 2006-6-24 09:24 +0200: ... One particular app that I have been working on uses very complex queries with lots of join etcit would be hard model to implement such queries on top of the ZODB/ZCatalog. I disagree with you (partially). The ZODB model (object data stored in a storage with behaviour coded in the clients) is powerful enough to simply implement the relational database data structures: tables and indexes and their corresponding operations. After this implementation, you could perform the same complex queries against the records maintained in these data structures. Of course you *could* do it but you can also shot yourself with a gun into your knee. You could also implement Zope on top of a touring machine but nobody would do that :-) Another point is performance: this app often has to perform a lot of insert/update/delete operations within one transaction (up to 1000 modifications). You would not believe how many objects your TextIndexNG modifies when it indexes a single document. Thousands is not very much ;-) Patches are welcome :-) -aj pgpAvUKwzxV7D.pgp Description: PGP signature ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Advice needed
Adam Groszer wrote at 2006-6-23 14:27 +0200: ... Some fears they are having and I can't find unambiguous information: - Is ZODB a good choice for this app? It depends... At least careful design is necessary! The most problematic aspects of the ZODB are write conflicts. When two concurrent transactions modify the same object a write conflict will occur unless the object provides application specific conflict resolution (which might resolve some conflicts). This behaviour is especially serious when you have expensive (long running) transactions. The longer the transaction runs the higher is the risk that it interferes with another transaction and the higher is the cost of the resulting conflict. This means that you must carefully design your system to reduce the risk of write conflicts. You can (and should!) for example use workflow to prevent that the same application object (read document in your case) can be modified by concurrent transactions. You are interested in this for other reasons as well (you do not want to wipe out the work of a colleague by overwriting his changes). This can clear this side (application objects) of the front. However, there are also global objects which can be modified concurrently. The most prominent example: catalog data structures (speak indexes). They do use application specific conflict resolution -- but it is often not good enough... For the catalog, you can move out indexing operations to a separate thread (done e.g. by the QueueCatalog Zope product). This considerably reduces write conflicts at the cost that indexing operations are no longer inline but lack a bit behind. If you implement other global objects with high write probability, you need to either implement appliciation specific conflict resolution (which is not always easy) or carefully reduce the conflict probability (e.g. by relaying the changes to a separate thread). Other aspects you should care about: * how to make backups for your system FileStorage tends to produce a few huge files which are difficult to backup (and restore) with standard means (standard incremental backup will not work). There is repozo to get efficient (non-standard) incremental backup. * if possible, partition your data and put each partition in its own storage. Your partitions should be self contained (in order to move then around and backup/restore them individually) * (FileStorage) startup time can be proportional to the storage size (when there is not up to date index file, e.g. due to an abnormal shutdown) * (FileStorage) RAM linear to the number of objects is needed (to maintain the map oid -- fileposition). That is probably not yet a concern for a few 100.000 objects. It will get one when you get a few 100.000.000 objects... Which storage to consider? Filestorage? maybe PGStorage? I think the only trustable storages are FileStorage and DirectoryStorage (requires file systems optimized for huge directories, e.g. ReiserFS). - ACID properties. Is it really ACID, I mean data consistency level could be compared to a RDB? You know that ACID is not ACID -- and that even most relational databases are not truely ACID. The ZODB does not guarantee the sequential transaction isolation model: which means that the execution of any set of transactions is equivalent to some sequential execution of these transactions. A realistic example (which caused a bug in Zope's catalog indexes) where this fails looks like this: Transaction 1: deletes a document d from an index document list index[term] and deletes index[term] when the list becomes empty if len(index[term]) == 1: del index[term] else: del index[term] Transactin 2: adds a document x to an index document list index[term] and creates the list, if necessary. if term in index: index[term] = DocumentList() index[term].insert(x) If these transactions are executed concurrently, the x may get lost; for example if the document list contains just d. In this case, transaction 1 will delete index[term] (because it does not yet see the effect of transaction 2). Transaction 2 will add the x to the old document list which is no longer used as soon as transaction 1 commits. Note that no sequential execution of T1 and T2 can have this result. Note that I had to work a bit to come up with the example above. The more straight forward implementation: Transaction 1: index[term].remove(d) if not index[term]: del index[term] would *not* have this problem -- because both transactions try to modify the same object (index[term]) which is recognized and prevented by the ZODB (the former bug in Zope indexing implementation resulted from the application specific conflict resolution which wrongly claimed to have