[ https://issues.apache.org/jira/browse/LUCENE-1426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12641747#action_12641747 ]
Michael McCandless commented on LUCENE-1426: -------------------------------------------- bq. TermDocs could have a list of Attributes that the posting list offers. I like this approach. Though unlike LUCENE-1422, where Token remains separate from TokenStream (and I'm still not sure it should be...?), I think for TermDocs there would not be the analog of a separate Token. Ie, it would look something like this: myPerDocAttr = termDocs.getAttribute(MyPerDoc.class); while(termDocs.next()) { x = myPerDocAttr.getValue(...); } However, this form of flexibility is actually beyond what I'm aiming for, for the first step of reader flexibility (there are so many facets of "flexible indexing"!). For starters I'd like to allow flexibility on how you encode the existing postings (doc/freq/positions/payloads). Whereas this flexibility is in extending what stuff is actually stored into & read from the index. I think we should do both, but my focus now is on the first one, specifically to be able to drop in a codec that uses pulsing, a less RAM-intestive terms dict indexing, and/or PFOR, etc. > Next steps towards flexible indexing > ------------------------------------ > > Key: LUCENE-1426 > URL: https://issues.apache.org/jira/browse/LUCENE-1426 > Project: Lucene - Java > Issue Type: Improvement > Components: Index > Reporter: Michael McCandless > Assignee: Michael McCandless > Priority: Minor > Fix For: 2.9 > > Attachments: LUCENE-1426.patch > > > In working on LUCENE-1410 (PFOR compression) I tried to prototype > switching the postings files to use PFOR instead of vInts for > encoding. > But it quickly became difficult. EG we currently mux the skip data > into the .frq file, which messes up the int blocks. We inline > payloads with positions which would also mess up the int blocks. > Skipping offsets and TermInfo offsets hardwire the file pointers of > frq & prox files yet I need to change these to block + offset, etc. > Separately this thread also started up, on how to customize how Lucene > stores positional information in the index: > http://www.gossamer-threads.com/lists/lucene/java-user/66264 > So I decided to make a bit more progress towards "flexible indexing" > by first modularizing/isolating the classes that actually write the > index format. The idea is to capture the logic of each (terms, freq, > positions/payloads) into separate interfaces and switch the flushing > of a new segment as well as writing the segment during merging to use > the same APIs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]