Hi! We have switched from Lucene 3.6 to >=Lucene 4.7 (java7) and we are also experiencing a distinct slowdown using the same dataset. We are running the software under Windows 2008R2.
In our case, we have identified that there a lot more IO calls (= number of time the buffer is refilled in IndexInput). It becomes more serious if you have more than one segment. It's not intuitive since the data on disk is more compact than before (thanks to the new codecs/compression techniques I guess). I would be useful if you could also compare between the two Lucene version the number of bytes read during each query. You could also measure the total time taken by the IOs and compare it to the total time taken by the query. In our configuration queries are "IO bound" and the total search time is proportional to the number of IO made by IndexInputs. (of course this doesn't apply for very small queries) Alessandro -----Original Message----- From: Shlomit Rosen [mailto:shlom...@il.ibm.com] Sent: mardi 17 juin 2014 10:37 To: java-user@lucene.apache.org Subject: Search degradation on Windows when upgrading from lucene 3.6 to lucene 4.7.2 Hi, We are in the process of upgrading from lucene 3.6.0 to lucene 4.7.2, and our tests show a significant search degradation on Windows platform. Trying to figure this out, here are a couple of points we noticed. Any suggestions/thoughts will be greatly appreciated. Thanks! 1) Running search on an optimized collection. Our first run on Windows machine showed the following results: Lucene 3.6: 115 queries / sec Lucene 4.7.2: 74 queries / sec Looking at the collections themselves, we got the following characterization: Lucene 3.6 General Index Information: ========================== Num docs: 10485760 Num deleted docs: 0 Deletion rate: 0% Number of files in FOLDER: 116 Total size of files in FOLDER: 81558862032 bytes (75.96 GB) Commit Point Information: ========================= Version: 1399567203042 Timestamp: 1399593668185 Generation: 6018 Segments file name: segments_4n6 Number of segments: 32 Committed size: 81216915273 bytes (75.64 GB) Number of files in COMMIT POINT: 89 Total size of files in COMMIT POINT: 81216923390 bytes (75.64 GB) Lucene 4.7.2: General Index Information: ========================== Num docs: 10485760 Num deleted docs: 0 Deletion rate: 0% Number of files in FOLDER: 301 Total size of files in FOLDER: 71019073768 bytes (66.14 GB) Commit Point Information: ========================= Generation: 4518 Segments file name: segments_3hi Number of segments: 38 Committed size: 70635339707 bytes (65.78 GB) Number of files in COMMIT POINT: 115 Total size of files in COMMIT POINT: 70635341223 bytes (65.78 GB) We saw that the collection created by lucene 4.7.2 was10GB smaller but it had a more segments. We thought that more segments might account to the search degradation, and so we decided to run optimization on the 4.7.2 index before rerunning the search test. The index was more compact: Lucene 4.7.2 General Index Information: ========================== Num docs: 10485760 Num deleted docs: 0 Deletion rate: 0% Number of files in FOLDER: 38 Total size of files in FOLDER: 70488334388 bytes (65.65 GB) Commit Point Information: ========================= Generation: 4519 Segments file name: segments_3hj Number of segments: 12 Committed size: 70488333864 bytes (65.65 GB) Number of files in COMMIT POINT: 37 Total size of files in COMMIT POINT: 70488334368 bytes (65.65 GB) And as expected, the search results were much better: 4.7.2. 118 queries / sec We thought that this might be a good direction, so our next step was to simulate a more compact index as part of our indexing session without running a full optimize at the end. To do that we changed maxMergeMB from 4 GB to 6 GB. The collection was indeed more compact: Win64 4.7.2 merge=6000 commitPoints: General Index Information: ========================== Num docs: 10485760 Num deleted docs: 0 Deletion rate: 0% Number of files in FOLDER: 213 Total size of files in FOLDER: 83038952682 bytes (77.34 GB) Commit Point Information: ========================= Generation: 4406 Segments file name: segments_3ee Number of segments: 14 Committed size: 70324985193 bytes (65.50 GB) Number of files in COMMIT POINT: 91 Total size of files in COMMIT POINT: 70324985781 bytes (65.50 GB) But search results were not good at all: 4.7.2: 72 queries / sec Does this make sense? We thought of "Optimize" as mainly decreasing the number of segments in the collection, and removing deletions. In this scenario, we had no deletions, and we saw that the number of segments did in fact decrease substantially, So why are we not seeing this reflect in search performance? Is there any other "optimize" influence/hidden-operation that we are missing here? (Note that we are using LogByteSizeMergePolicy. We know that TieredMergePolicy is suppose to be better in this aspect, but it is important to us To keep the order of the documents the same between commit points... ) 2) Search Directory On Lucene 3.6, we did comprehensive testing and saw that the best search performance is reached when using an Mmap directory. (for Indexing we are using SimpleFSDirectory). We tried different directories again with lucene 4.7.2, and while the differences were not big, it still seems that Mmap is no longer the best option: Lucene 4.7.2 with MMap: 72 queries / sec Lucene 4.7.2 with SimpleFS: 84 queries / sec Was there any changes around the MMap directory that might account for this difference? If so, do you think that those changes might account for the overall performance we are seeing? 3) Java 6 / Java 7 We are currently running on Java 6 (that is also the reason we stopped at lucene 4.7.2 and not 4.8). Is there a reason to believe that the degradation might be connected to this? Thanks again in advance! --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org