[ 
https://issues.apache.org/jira/browse/LUCENE-2621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2621:
--------------------------------

    Attachment: LUCENE-2621_tv_fi_si.patch

Attached is a new patch between trunk and branch. I think its at a point ready 
for merging.
* term vectors and fieldinfos move to codec.
* segmentinfos is moved to codec (before you could only realistically tweak a 
few things).
* term vectors are cut over to flex apis
* much better testing of term vectors in checkindex.
* added simpletext impls of term vectors, fieldinfos, and segmentinfos.

After this I would propose closing this issue and opening followup issues for:
* make a new more efficient term vector implementation for 4.0, the existing 
one would go to preflex, and preflex impl should reorder the terms correctly to 
UTF8 order (this is a bug all along in trunk, not caused here!)
* see if we can remove the global .fnx file completely, as its not per-segment 
and i'm not sure its totally necessary, perhaps the field number consistency 
can be achieved with another mechanism. Otherwise, we should add a codec 
hack/hook at least so that preflexRW can write segments without .fnx files.
* make preflex implementations of the other various reader/writers so that our 
4.0 impls are clean and don't contain backwards compatibility code, and so that 
we have more realistic testing of backwards with PreFlexRW.
* allow adding offsets to the postings lists impls either 
startOffset/endOffset() or via attribute like term vectors do in this patch, so 
that a D&Penum can retrieve the offsets at a position. this could make 
highlighting much faster without having to use vectors.
* try to make a few other things like deletes extendable via codec
* figure out a good design to cut over norms to DocValues.
* add a SimpleTextDocValues, its sorely needed.
                
> Extend Codec to handle also stored fields and term vectors
> ----------------------------------------------------------
>
>                 Key: LUCENE-2621
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2621
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index
>    Affects Versions: 4.0
>            Reporter: Andrzej Bialecki 
>            Assignee: Robert Muir
>              Labels: gsoc2011, lucene-gsoc-11, mentor
>         Attachments: LUCENE-2621.patch, LUCENE-2621_rote.patch, 
> LUCENE-2621_tv_fi_si.patch
>
>
> Currently Codec API handles only writing/reading of term-related data, while 
> stored fields data and term frequency vector data writing/reading is handled 
> elsewhere.
> I propose to extend the Codec API to handle this data as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to