[jira] Commented: (LUCENE-2843) Add variable-gap terms index impl.

Michael McCandless (JIRA) Mon, 10 Jan 2011 04:05:13 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12979553#action_12979553
 ]


Michael McCandless commented on LUCENE-2843:
--------------------------------------------

bq. If a system is forced to swap out, it'll swap your explicitly managed RAM 
just as likely as memory-mapped files.

In fact, even if it's not under any real memory pressure the OS will swap out 
your not-recently-accessed RAM.  Net/net this is a good policy, if your metric 
is total throughput accomplished by all programs.

But if your metric is latency to search queries, this is an awful policy.

Fortunately OSs (at least Windows & Linux) give you some tunability here.  
Unfortunately, the tunable is global and it defaults "badly" for those programs 
that do make a careful distinction b/w what data structures are best held in 
RAM and what data is best left on disk.

If I could I would offer an option to pin these pages, so the OS cannot swap 
them out, but I don't think we can do (easily) that from javaland (and I think 
you'd have to be root).  Lacking pinning the best (approximation) we can do is 
pull these ourselves into RAM.

> Add variable-gap terms index impl.
> ----------------------------------
>
>                 Key: LUCENE-2843
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2843
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 4.0
>
>         Attachments: LUCENE-2843.patch, LUCENE-2843.patch
>
>
> PrefixCodedTermsReader/Writer (used by all "real" core codecs) already
> supports pluggable terms index impls.
> The only impl we have now is FixedGapTermsIndexReader/Writer, which
> picks every Nth (default 32) term and holds it in efficient packed
> int/byte arrays in RAM.  This is already an enormous improvement (RAM
> reduction, init time) over 3.x.
> This patch adds another impl, VariableGapTermsIndexReader/Writer,
> which lets you specify an arbitrary IndexTermSelector to pick which
> terms are indexed, and then uses an FST to hold the indexed terms.
> This is typically even more memory efficient than packed int/byte
> arrays, though, it does not support ord() so it's not quite a fair
> comparison.
> I had to relax the terms index plugin api for
> PrefixCodedTermsReader/Writer to not assume that the terms index impl
> supports ord.
> I also did some cleanup of the FST/FSTEnum APIs and impls, and broke
> out separate seekCeil and seekFloor in FSTEnum.  Eg we need seekFloor
> when the FST is used as a terms index but seekCeil when it's holding
> all terms in the index (ie which SimpleText uses FSTs for).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2843) Add variable-gap terms index impl.

Reply via email to