Explore other in-memory postinglist formats for realtime search
---------------------------------------------------------------

                 Key: LUCENE-2346
                 URL: https://issues.apache.org/jira/browse/LUCENE-2346
             Project: Lucene - Java
          Issue Type: Improvement
          Components: Index
            Reporter: Michael Busch
            Assignee: Michael Busch
            Priority: Minor
             Fix For: 3.1


The current in-memory posting list format might not be optimal for searching. 
VInt decoding performance and the lack of skip lists would arguably be the 
biggest bottlenecks.

For LUCENE-2312 we should investigate other formats.

Some ideas:
- PFOR or packed ints for posting slices?
- Maybe even int[] slices instead of byte slices? This would be great for 
search performance, but the additional memory overhead might not be acceptable.
- For realtime search it's usually desirable to evaluate the most recent 
documents first.  So using backward pointers instead of forward pointers and 
having the postinglist pointer point to the most recent docID in a list is 
something to consider.
- Skipping: if we use fixed-length postings ([packed] ints) we can do binary 
search within a slice.  We can also locate a pointer then without scanning and 
thus skip entire slices quickly.  Is that sufficient or would we need more 
skipping layers, so that it's possible to skip directly to particular slices?


It would be awesome to find a format that doesn't slow down "normal" indexing, 
but is very efficient for in-memory searches.  If we can't find such a fits-all 
format, we should have a separate indexing chain for real-time indexing.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to