Missed the attachment, sorry. Thomas Becker wrote: > Hi all, > > I'm experiencing a performance degradation after migrating to 2.9 and running > some tests. I'm getting out of ideas and any help to identify the reasons why > 2.9 is slower than 2.4 are highly appreciated. > > We've had some issues with custom sorting in lucene 2.4.1. We worked around > them > by sorting the resultsets manually and caching the results after sorting > (memory > consuming but fast). > > I now migrated to lucene 2.9.0RC4. Build some new FieldComparatorSource > implementation for sorting and refactored all deprecated api calls to the new > lucene 2.9 api. > > Everything works fine from a functional perspective. But performance severly > is > (negatively) affected by lucene 2.9. > > I profiled the application for a couple of hours, build a jmeter load test and > compared the following scenarios: > > 1. lucene 2.9 - new api > 2. lucene 2.9 - old api and custom sorting after lucene > 3. lucene 2.4.1 - old api and custom sorting after lucene (what we had up2now) > > Please find attached an rrd graph showing the results. The lighter the color > the > faster the request has been served. y=# requests, x=time. > > Most interestingly simply switching the lucene jars between 2.4 and 2.9 > degraded > response times and therefore throughput (see results of testcase 2 and 3). > Adapting to the new api decreased performance again. The difference between > testcase 1 and 2 is most probably due to precached custom sorted results. > > The application under test is a dedicated lucene search engine doing nothing > else, but serving search requests. We're running a cluster of them in prd and > it's incredibly fast. With the old implementation and prd traffic we've above > 98% of the requests served in 200ms. > The index under test contains about 3 million documents (with lots of fields), > consumes about 2,5gig disk space and is stored on a tmpfs RAMDISK provided by > the linux kernel. > > Most interesting methods used for searching are: > > getHitsCount (is there a way to speed this up?): > > public int getHitsCount(String query, Filter filter) throws > LuceneServiceException { > log.debug("getHitsCount('{}, {}')", query, filter); > if (StringUtils.isBlank(query)) { > log.warn("getHitsCount: empty lucene query"); > return 0; > } > long startTimeMillis = System.currentTimeMillis(); > int count = 0; > > if (indexSearcher == null) { > return 0; > } > > BooleanQuery.setMaxClauseCount(MAXCLAUSECOUNT); > Query q = null; > try { > q = createQuery(query); > TopScoreDocCollector tsdc = > TopScoreDocCollector.create(1, true); > indexSearcher.search(q, filter, tsdc); > count = tsdc.getTotalHits(); > log.info("getHitsCount: count = {}",count); > } catch (ParseException ex) { > throw new LuceneServiceException("invalid lucene > query:" + query, ex); > } catch (IOException e) { > throw new LuceneServiceException(" indexSearcher could > be corrupted", e); > } finally { > long durationMillis = System.currentTimeMillis() - > startTimeMillis; > if (durationMillis > slowQueryLimit) { > log.warn("getHitsCount: Slow query: {} ms, > query={}", durationMillis, query); > } > log.debug("getHitsCount: query took {} ms", > durationMillis); > } > return count; > } > > search: > public List<Document> search(String query, Filter filter, Sort sort, > int from, > int size) throws LuceneServiceException { > log.debug("{} search('{}', {}, {}, {}, {})", new Object[] { > indexAlias, query, > filter, sort, from, size }); > long startTimeMillis = System.currentTimeMillis(); > > List<Document> docs = new ArrayList<Document>(); > if (indexSearcher == null) { > return docs; > } > Query q = null; > try { > if (query == null) { > log.warn("search: lucene query is null..."); > return docs; > } > q = createQuery(query); > BooleanQuery.setMaxClauseCount(MAXCLAUSECOUNT); > if (size < 0 || size > maxNumHits) { > // set hard limit for numHits > size = maxNumHits; > if (log.isDebugEnabled()) > log.debug("search: Size set to > hardlimit: {} for query: {} with filter: > {}", new Object[] { size, query, filter }); > } > TopFieldCollector collector = > TopFieldCollector.create(sort, size + from, > true, false, false, true); > indexSearcher.search(q, filter, collector); > if(size > collector.getTotalHits()) > size = collector.getTotalHits(); > if (size > 100000) > log.info("search: size: {} bigger than 100.000 > for query: {} with filter: > {}", new Object[] { size, query, filter }); > TopDocs td = collector.topDocs(from, size); > ScoreDoc[] scoreDocs = td.scoreDocs; > for (ScoreDoc scoreDoc : scoreDocs) { > docs.add(indexSearcher.doc(scoreDoc.doc)); > } > } catch (ParseException e) { > log.warn("search: ParseException: {}", e.getMessage()); > if (log.isDebugEnabled()) > log.warn("search: ParseException: ", e); > return Collections.emptyList(); > } catch (IOException e) { > log.warn("search: IOException: ", e); > return Collections.emptyList(); > } finally { > long durationMillis = System.currentTimeMillis() - > startTimeMillis; > if (durationMillis > slowQueryLimit) { > log.warn("search: Slow query: {} ms, query={}, > indexUsed={}", > new Object[] { durationMillis, > query, > indexSearcher.getIndexReader().directory() }); > } > log.debug("search: query took {} ms", durationMillis); > } > return docs; > } > > I'm wondering why others are experiencing better performance with 2.9 and why > our implementations performance is going bad. Maybe our way of using the 2.9 > api > is not the best and sorting is definetly expensive. > > Any ideas are appreciated. I'm torning out my hair since hours and days to > find > the root cause. Also hints how I could find the bottlenecks myself are > appreciated. > > Cheers, > Thomas >
-- Thomas Becker Senior JEE Developer net mobile AG Zollhof 17 40221 Düsseldorf GERMANY Phone: +49 211 97020-195 Fax: +49 211 97020-949 Mobile: +49 173 5146567 (private) E-Mail: mailto:thomas.bec...@net-m.de Internet: http://www.net-m.de Registergericht: Amtsgericht Düsseldorf, HRB 48022 Vorstand: Theodor Niehues (Vorsitzender), Frank Hartmann, Kai Markus Kulas, Dieter Plassmann Vorsitzender des Aufsichtsrates: Dr. Michael Briem
--------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org