Hello Ivan, That was cool news! Thanks! :) The timings are surprisingly good. 10 mln docs sorted in 20s.. cool! Also it looks like sorting algorithm employed by Lucene is quite memory-economic.
Not supporting multiple fields is in fact another limitation of my patch. I don't need it so I didn't implement it :) What is needed to implement it is probably do it manually - employ FieldSelector fetching that bunch of fields; change compare(ScoreDoc scoreDoc1, ScoreDoc scoreDoc2) method so that it compares docs by a bunch of fields (there should be also another array of Asc/Desc flags somewhere which makes this more complicated) instead of single field; that's it. I don't understand yet why Sort(SortField[] fields) didn't give the same when fields.length == 1.. Probably we should dig into Lucene code to find out. In case of several fields I can imagine why this approach would be less effective: at least N*2 Document reads (by StoredFieldComparator.sortValue) will be needed to compare 2 documents (N is length of fields array). One read with appropriate FieldSelector is likely to perform better. Anyway, I do think StoredFieldSortFactory's approach could be successfully applied to multiple fields, but I'm not going to implement it yet. May be you? :) Regards, Artem IV> Hi Artem, IV> Thank you very much for your mails :) IV> So first I have to tell you that your patch works perfectly even with IV> very big indexes - 40 GB (you can see the results bellow). IV> The reason I to have bad test results last time is that I made a bit IV> change (but I can not understand why this change made problem - on my IV> opinion it should not have so big effects on performance). IV> So the change that I made is - I added a new method in the class IV> StoredFieldSortFactory. It is the same like create(String sortFieldName, IV> boolean sortDescending) method but instead of wrapping SortField it IV> return it directly and in my class I wrap this object in a Sort one. IV> Here is the code: IV> public static SortField createSortField(String sortFieldName, boolean IV> sortDescending) { IV> return new SortField(sortFieldName, instance, sortDescending); IV> } IV> I do this because we have to support sorting on multiple fields and I IV> obtain all SortField objects in a cycle and then create Sort out of them: IV> Sort sort = new Sort(sortFields); IV> In my tests that were with very bad results (time for searches was more IV> than 5 mins) in all the tests I used sorting ONLY BY ONE FIELD (means IV> the array sortFields was always with length 1). IV> But I still used the constructor Sort(SortField[]) but not IV> Sort(SortField) as originally in your code in the method IV> StoredFieldSortFactory.create(..). IV> Do you think this is the reason for pure performance? IV> If so, COULD YOU PLEASE TELL ME how to use your patch for sorting on IV> multiple stored fields? IV> Here are the test result of your patch with different indexes (the tests IV> are with code just as you recommend to use it - with using of your IV> create(..) method that uses constructor Sort(SortField) ): IV> - CPU - Intel Core2Duo, max memory allowed to the process that makes IV> searching - 1GB (not all of it used) IV> ********************************************************************************************************** IV> - index size 3,3 GB, about 486 410 documents (all the testing searches IV> include all documents); IV> ____________________________________________________________________________________________ IV> - field size - it is file name and varies - on my opinion 15 - 30 chars IV> average. IV> - search time (ASC) - 1,312 s, memory usage - 71MB IV> - search time (DSC) - 1,281 s, memory usage - 71MB IV> - field size - it is abs path name and varies - on my opinion 60 - 90 IV> chars average. IV> - search time (ASC) - 1,344 s, memory usage - 71MB IV> - search time (DSC) - 1,328 s, memory usage - 71MB IV> - field size - it is file size and varies - on my opinion 3 - 7 chars IV> average. IV> - search time (ASC) - 1,313 s, memory usage - 71MB IV> - search time (DSC) - 1,312 s, memory usage - 71MB IV> ********************************************************************************** IV> - index size 21,4 GB, about 376 999 documents (all the testing searches IV> include all documents); IV> ____________________________________________________________________________________________ IV> - field size - it is file name and varies - on my opinion 15 - 30 chars IV> average. IV> - search time (ASC) - 0,875 s, memory usage - 371MB IV> - search time (DSC) - 0,828 s, memory usage - 371MB IV> - field size - it is abs path name and varies - on my opinion 60 - 90 IV> chars average. IV> - search time (ASC) - 0,844 s, memory usage - 371MB IV> - search time (DSC) - 0,813 s, memory usage - 371MB IV> - field size - it is file size and varies - on my opinion 3 - 7 chars IV> average. IV> - search time (ASC) - 0,813 s, memory usage - 371MB IV> - search time (DSC) - 0,797 s, memory usage - 371MB IV> ********************************************************************************** IV> - index size 42,9 GB, about 10 944 918 documents (all the testing IV> searches include all documents); IV> ____________________________________________________________________________________________ IV> - field size - it is file name and varies - on my opinion 15 - 30 chars IV> average. IV> - search time (ASC) - 21,905 s, memory usage - 625MB IV> - search time (DSC) - 21,781 s, memory usage - 625MB IV> - field size - it is abs path name and varies - on my opinion 60 - 90 IV> chars average. IV> - search time (ASC) - 21,874 s, memory usage - 625MB IV> - search time (DSC) - 21,749 s, memory usage - 625MB IV> - field size - it is file size and varies - on my opinion 3 - 7 chars IV> average. IV> - search time (ASC) - 21,687 s, memory usage - 625MB IV> - search time (DSC) - 21,812 s, memory usage - 625MB IV> THANK YOU VERY MUCH, IV> Ivan -- Best regards, Artem mailto:[EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]