[
https://issues.apache.org/jira/browse/LUCENE-1340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless updated LUCENE-1340:
---------------------------------------
Attachment: LUCENE-1340.patch
Thanks eks, that was fast -- I think you set a new record!
The patch looks good, though we definitely need some solid unit tests
here. I made some small (whitespace, spelling, naming) corrections &
attached a new rev of the patch.
One question I have: right now if a single field has mixed true/false
for omitTf, you set it to false, meaning we start storing the term
freq, pos, payloads again. Can/should we do the reverse instead? If
we did, we could make some further optimizations, eg right now we
consume RAM storing all positions/payloads on a field that has omitTF=true
on the possibility that we may stll see omitTf=false in the same session.
With this patch we still store the *.prx bytes for a field with
omitTf=true. Can you fix that? I think in FreqProxTermsWriter you
can simply not write any bytes to the proxOut; likewise in
SegmentMerger and SegmentTermPositions, don't try to read bytes from
the prx file if omitTf==true.
I'd also be curious about what gains in index size & filter
performance we see with these new boolean fields.
> Make it posible not to include TF information in index
> ------------------------------------------------------
>
> Key: LUCENE-1340
> URL: https://issues.apache.org/jira/browse/LUCENE-1340
> Project: Lucene - Java
> Issue Type: New Feature
> Components: Index
> Reporter: Eks Dev
> Priority: Minor
> Attachments: LUCENE-1340.patch, LUCENE-1340.patch
>
> Original Estimate: 24h
> Remaining Estimate: 24h
>
> Term Frequency is typically not needed for all fields, some CPU (reading one
> VInt less and one X>>>1...) and IO can be spared by making pure boolen fields
> possible in Lucene. This topic has already been discussed and accepted as a
> part of Flexible Indexing... This issue tries to push things a bit faster
> forward as I have some concrete customer demands.
> benefits can be expected for fields that are typical candidates for Filters,
> enumerations, user rights, IDs or very short "texts", phone numbers, zip
> codes, names...
> Status: just passed standard test (compatibility), commited for early review,
> I have not tried new feature, missing some asserts and one two unit tests
> Complexity: simpler than expected
> can be used via omitTf() (who used omitNorms() will know where to find it :)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]