[ 
https://issues.apache.org/jira/browse/LUCENE-1426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12641574#action_12641574
 ] 

Michael Busch commented on LUCENE-1426:
---------------------------------------

{quote}
+1 This sounds like a great way to approach flexible indexing: incrementally. 
{quote}

Couldn't agree more. This is great!

{quote}
The next step, which is trickier, is to modularize/genericize the
classes the read from the index, and then refactor
SegmentTerm(Enum,Docs,Positions) to use that codec API.
{quote}

Yes this is definitely the tricky part. I've been thinking a bit about this and 
was wondering if for the read APIs we could do something similar as with the 
new Token API (LUCENE-1422)? TermDocs could have a list of Attributes that the 
posting list offers. If for example no payloads are stored in the posting list, 
then TermDocs should not offer that corresponding Attribute.
This approach should be just as fast as the current API. When the application 
opens a TermDocs, it could check for the offered Attributes before it starts 
iterating the postinglist, and keep references to the Attribute. (in fact 
that's exactly the same approach as the TokenStream/Token/Consumer approach in 
LUCENE-1422).

Thoughts?

> Next steps towards flexible indexing
> ------------------------------------
>
>                 Key: LUCENE-1426
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1426
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1426.patch
>
>
> In working on LUCENE-1410 (PFOR compression) I tried to prototype
> switching the postings files to use PFOR instead of vInts for
> encoding.
> But it quickly became difficult.  EG we currently mux the skip data
> into the .frq file, which messes up the int blocks.  We inline
> payloads with positions which would also mess up the int blocks.
> Skipping offsets and TermInfo offsets hardwire the file pointers of
> frq & prox files yet I need to change these to block + offset, etc.
> Separately this thread also started up, on how to customize how Lucene
> stores positional information in the index:
>   http://www.gossamer-threads.com/lists/lucene/java-user/66264
> So I decided to make a bit more progress towards "flexible indexing"
> by first modularizing/isolating the classes that actually write the
> index format.  The idea is to capture the logic of each (terms, freq,
> positions/payloads) into separate interfaces and switch the flushing
> of a new segment as well as writing the segment during merging to use
> the same APIs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to