[jira] Updated: (LUCENE-1340) Make it posible not to include TF information in index

Michael McCandless (JIRA) Sat, 19 Jul 2008 02:34:57 -0700

     [ 
https://issues.apache.org/jira/browse/LUCENE-1340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Michael McCandless updated LUCENE-1340:
---------------------------------------

    Attachment: LUCENE-1340.patch

Thanks eks, that was fast -- I think you set a new record!

The patch looks good, though we definitely need some solid unit tests
here.  I made some small (whitespace, spelling, naming) corrections &
attached a new rev of the patch.

One question I have: right now if a single field has mixed true/false
for omitTf, you set it to false, meaning we start storing the term
freq, pos, payloads again.  Can/should we do the reverse instead?  If
we did, we could make some further optimizations, eg right now we
consume RAM storing all positions/payloads on a field that has omitTF=true
on the possibility that we may stll see omitTf=false in the same session.

With this patch we still store the *.prx bytes for a field with
omitTf=true.  Can you fix that?  I think in FreqProxTermsWriter you
can simply not write any bytes to the proxOut; likewise in
SegmentMerger and SegmentTermPositions, don't try to read bytes from
the prx file if omitTf==true.

I'd also be curious about what gains in index size & filter
performance we see with these new boolean fields.


> Make it posible not to include TF information in index
> ------------------------------------------------------
>
>                 Key: LUCENE-1340
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1340
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index
>            Reporter: Eks Dev
>            Priority: Minor
>         Attachments: LUCENE-1340.patch, LUCENE-1340.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Term Frequency is typically not needed  for all fields, some CPU (reading one 
> VInt less and one X>>>1...) and IO can be spared by making pure boolen fields 
> possible in Lucene. This topic has already been discussed and accepted as a 
> part of Flexible Indexing... This issue tries to push things a bit faster 
> forward as I have some concrete customer demands.
> benefits can be expected for fields that are typical candidates for Filters, 
> enumerations, user rights, IDs or very short "texts", phone  numbers, zip 
> codes, names...
> Status: just passed standard test (compatibility), commited for early review, 
> I have not tried new feature, missing some asserts and one two unit tests
> Complexity: simpler than expected
> can be used via omitTf() (who used omitNorms() will know where to find it :)  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Updated: (LUCENE-1340) Make it posible not to include TF information in index

Reply via email to