Hi folks, We are currently using Lucene 4.5 and we are hitting some bottlenecks and appreciate some input from the community.
This particular index (the disk size for which is about 10GB) is guaranteed to not have any updates, so we made it a single segment index by doing a forceMerge(1). The index is guaranteed to be in-memory as well: we use the MMapDirectory and the whole thing is mlocked after load. So there is no disk I/O. Our runtime/search use-case is very simple: run filters to select all docs that match some conditions specified in a filter query (we do not use Lucene scoring) and return the first 100 docs that match (this is an over-simplification) On a machine with nothing else running, we are unable to move the needle on CPU utilization to serve higher QPS. We see that most of the time is spent in BlockTreeTermsReader.FieldReader.iterator() when we run profiling tools to see where time is being spent. The CPU usage doesn't cross 30% (we have multiple threads one per each client connected over a Jetty connection all taken from a bounded thread-pool). We tried the usual suspects like tweaking size of the threadpool, changing some jvm parameters like newsize, heapsize, using cms for old gen, parnew for newgen, etc. Does anyone here any pointers or general suggestions on how we can get good performance out of Lucene 4.x? Specifically IndexSearcher performance improvements for large, single-segment, atomicreaders. I'll share more specifics if necessary but I'd like to hear from folks here what your experience has been and what you did to speed up your IndexSearchers to improve throughput *and/or* latency. Thanks! -- Arvind Kalyan http://www.linkedin.com/in/base16