[ZODB-Dev] Re: [Zope3-dev] Re: Community opinion about search+filter

Martijn Faassen Sun, 25 Mar 2007 08:33:44 -0800

Hey Jim,

Jim Fulton wrote:

On Mar 25, 2007, at 3:01 AM, Adam Groszer wrote:
MF> I think one of the main limitations of the current catalog (and
MF> hurry.query) is efficient support for sorting and batching the query
MF> results. The Zope 3 catalog returns all matching results, whichcan then
MF> be sorted and batched. This will stop being scalable for large
MF> collections. A relational database is able to do this internally,and is
MF> potentially able to use optimizations there.
What evidence to you have to support this assertion?

I have the strong suspicion that modern relational databases arecurrently better able to scale at queries using LIMIT and ORDER BY thanthe Zope 3 catalog. I cannot back this up as I haven't donemeasurements. Perhaps you have done so?

* Do you estimate the performance of the Zope 3 catalog to be equivalentto the performance of a modern relational database system for queriesthat need to sort and batch their results?

* If so, do you think it's just as easy for a developer to accomplishsuch equivalent performance with the Zope 3 catalog as it is with arelational database?


I've made a number of assertions:

a) one of the main limitations of the current catalog and hurry.query isefficient support for sorting and batching.

b) the Zope 3 catalog returns all matching results, which can then besorted and batched. This will stop being scalable for large collections.

I'll amend b) by saying 'This will stop being scalable for large resultsets'. I agree that b) as stated above is incorrect as the result setmight be small, but I intended the amendment.

c) A relational database is able to do sorting and batching (limitqueries) internally, and is potentially able to use optimizations here.


Which of these assertions are false?

Don't you think relational database system that has support for sortingand batching built into its query API can at the very least more easilyuse approaches to reduce sorting cost, by rewriting the query, caching,and potentially employing special indexes?

We did someliterature search on this a few years ago and found no special trick toavoid sorting costs.

I am at least cursorily aware of challenges surrounding efficientquerying and batching. I am not looking for a special trick or magicbullet. I'd just like more help in avoiding sorting cost in a typicalsituation where results are displayed in a batched format.

If a catalog query returns 1 million results, which I want to show inbatches of 10, sorted by some property of the results, I would like toreduce the costs. Currently the pattern I (and I imagine others) employis to re-execute the query and then sort these results in memory foreach batch, for each request.


[you list some approaches to reduce sorting cost]

I would like some system that helps me reduce some of these costs, usingthe approaches you list, or at least some caching somewhere. I wouldimagine a relational database for instance can employ caching of resultsets, so that if no writes occurred, a second LIMIT query asking for adifferent range will return results a lot faster.

Apparently the catalog does support N-best, you state later in thisthread. How does one use this support? Can I add it to hurry.query somehow?

Perhaps all this is not the reponsibility of the catalog itself, but asystem surrounding it. As long as it's obviously there for people to use.


Perhaps however I am seeing problems that aren't there?

Do you think there is no problem and we have parity with relationaldatabase implementations here?


Do you think the current situation cannot be improved much further?

Do you think any further improvements are not worth the costs?

Regards,

Martijn

_______________________________________________
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

[ZODB-Dev] Re: [Zope3-dev] Re: Community opinion about search+filter

Reply via email to