[
https://issues.apache.org/jira/browse/LUCENE-2621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Muir updated LUCENE-2621:
--------------------------------
Attachment: LUCENE-2621_tv_fi_si.patch
Attached is a new patch between trunk and branch. I think its at a point ready
for merging.
* term vectors and fieldinfos move to codec.
* segmentinfos is moved to codec (before you could only realistically tweak a
few things).
* term vectors are cut over to flex apis
* much better testing of term vectors in checkindex.
* added simpletext impls of term vectors, fieldinfos, and segmentinfos.
After this I would propose closing this issue and opening followup issues for:
* make a new more efficient term vector implementation for 4.0, the existing
one would go to preflex, and preflex impl should reorder the terms correctly to
UTF8 order (this is a bug all along in trunk, not caused here!)
* see if we can remove the global .fnx file completely, as its not per-segment
and i'm not sure its totally necessary, perhaps the field number consistency
can be achieved with another mechanism. Otherwise, we should add a codec
hack/hook at least so that preflexRW can write segments without .fnx files.
* make preflex implementations of the other various reader/writers so that our
4.0 impls are clean and don't contain backwards compatibility code, and so that
we have more realistic testing of backwards with PreFlexRW.
* allow adding offsets to the postings lists impls either
startOffset/endOffset() or via attribute like term vectors do in this patch, so
that a D&Penum can retrieve the offsets at a position. this could make
highlighting much faster without having to use vectors.
* try to make a few other things like deletes extendable via codec
* figure out a good design to cut over norms to DocValues.
* add a SimpleTextDocValues, its sorely needed.
> Extend Codec to handle also stored fields and term vectors
> ----------------------------------------------------------
>
> Key: LUCENE-2621
> URL: https://issues.apache.org/jira/browse/LUCENE-2621
> Project: Lucene - Java
> Issue Type: Improvement
> Components: core/index
> Affects Versions: 4.0
> Reporter: Andrzej Bialecki
> Assignee: Robert Muir
> Labels: gsoc2011, lucene-gsoc-11, mentor
> Attachments: LUCENE-2621.patch, LUCENE-2621_rote.patch,
> LUCENE-2621_tv_fi_si.patch
>
>
> Currently Codec API handles only writing/reading of term-related data, while
> stored fields data and term frequency vector data writing/reading is handled
> elsewhere.
> I propose to extend the Codec API to handle this data as well.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]