> Christoph Kiehl wrote: > But as you mentioned in your previous mail there are some > problematic queries which are way to slow like ChildAxisQuery > or DescendantSelfAxisQuery. All queries that need to read > lucene documents instead of just using a query get pretty > slow with large repositories. But I didn't see a way yet how > to substantially improve performance while using lucene. I > even thought of using some other kind of indexing since lucene ... > Internally we use a specific mixin for our documents as a > workaround. This way I can avoid ChildAxisQueries and the > like. I just query for "//element(*, foo-mix:document)[...]" > for example. But that is just a dirty workaround.
This argument holds for my solution as well :-( > I would really like to find a solution to those problems. > Maybe we should use some additional kind of index for > resolving parent-child relations. Do you have any ideas yet > how improve performance in those areas? AFAICS, when we want to solve it within lucene with querying, we will have a trade-off between "fast searching" and fast "moving of nodes" (I'll get back on this one) Currently, we are building a layer on top of JackRabbit that amongst many other things at least needs to be able to: 1) port legacy code which had slide as repository 2) show all documents/nodes through faceted navigation Since we have quite many large projects running with slide as repository, and since we use a custom slide/lucene index to be able to search fast, I need some queries in JackRabbit to be much faster than currently possible. Obviously, since (2) must be implemented, almost every call to JackRabbit will be a search. A very basic search we have hundreds of times for legacy projects with slide would be: /documents/en/news//[EMAIL PROTECTED] order by @modificationDate Typically, a news folder contains tens of thousands of items, and this query is not possible with the current JackRabbit impl (at least, my experience is that for > 10.000 docs this query takes multiple seconds, while I need the result in < 50ms (50 is really the max IMO) ). Now, I chose that for some queries that I control exactly, so I know I won't have queries like /documents/en[1]/news[1] or documents/[EMAIL PROTECTED]/news or documents/*/news, but only queries that look like /nodename/nodename/nodename/**[......] that I translate the initial part to something like: TermQuery(new Term(FieldNames.INITIAL_PATH, path)) where for example path='/documents/en/news' Obviously, this only works when I index a node's path in some lucene field. So a node with path /documents/en/news/2007/10/14/item.xml would have lucene Field that contains the terms '/documents/en/news/2007/10/14/item.xml' '/documents/en/news/2007/10/14' '/documents/en/news/2007/10' '/documents/en/news/2007' '/documents/en/news' '/documents/en' '/documents' Obviously, this results in very fast simple lucene search for 'give me all nodes starting with path x' because it is just 1 simple TermQuery, but as a major disadvantage, it is now very costly to move a node, because this requires re-indexing the tree below that node. Also, I can only use it for queries with a basic 'start-path', though it might be enhanced to suppose '*' and /[EMAIL PROTECTED] Bottomline, I haven't found the holy grail either, but at least I have responses within ms for hundreds of thousands of nodes :-) I am not sure if there is a solution for fast searching for DescendantSelfAxisQuery and at the same time fast moving of nodes. I choose to be able to search fast, and hope people won't be moving the node directly under the root to many times :-) Regards Ard > > Cheers, > Christoph > >
