[jira] Commented: (LUCENE-1426) Next steps towards flexible indexing

Paul Elschot (JIRA) Tue, 21 Oct 2008 14:19:46 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-1426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12641599#action_12641599
 ]


Paul Elschot commented on LUCENE-1426:
--------------------------------------

bq. ... it would make sense to use VInts for very short postings and PFOR for 
the rest. I just do not remember rationale behind it.
bq. ... cool idea to actually inline very short postings into term dict instead 
of storing offset.

Iirc the rationale was that PFOR has most performance benefits on integer 
arrays of more than 100 elements.
Shorter lists of numbers might also benefit from using (P)FOR instead of VInt, 
I don't know how big the break even size is.

bq. for starters (we) could simply implement random access as "load & decode 
the entire block, then look at the part you want" and then assess the cost.

I've just started some performance tests on PFOR patching (i.e. filling in the 
exceptions), and I'm not happy with what I'm seeing. More on this later at 1410.


On allowing a payload to accompany the field norms:
bq. Couldn't stored fields, once they are faster (with column-stride fields, 
LUCENE-1231) solve this?

Yes.


> Next steps towards flexible indexing
> ------------------------------------
>
>                 Key: LUCENE-1426
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1426
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1426.patch
>
>
> In working on LUCENE-1410 (PFOR compression) I tried to prototype
> switching the postings files to use PFOR instead of vInts for
> encoding.
> But it quickly became difficult.  EG we currently mux the skip data
> into the .frq file, which messes up the int blocks.  We inline
> payloads with positions which would also mess up the int blocks.
> Skipping offsets and TermInfo offsets hardwire the file pointers of
> frq & prox files yet I need to change these to block + offset, etc.
> Separately this thread also started up, on how to customize how Lucene
> stores positional information in the index:
>   http://www.gossamer-threads.com/lists/lucene/java-user/66264
> So I decided to make a bit more progress towards "flexible indexing"
> by first modularizing/isolating the classes that actually write the
> index format.  The idea is to capture the logic of each (terms, freq,
> positions/payloads) into separate interfaces and switch the flushing
> of a new segment as well as writing the segment during merging to use
> the same APIs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-1426) Next steps towards flexible indexing

Reply via email to