[jira] [Commented] (BLUR-30) Extend lucene 4 AppendingCodec and add a compression option for the field storage.

Aaron McCurry (JIRA) Fri, 19 Oct 2012 06:10:19 -0700

    [ 
https://issues.apache.org/jira/browse/BLUR-30?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480007#comment-13480007
 ]


Aaron McCurry commented on BLUR-30:
-----------------------------------

So a little background here, Blur use to have a CompressedFieldStoreDirectory 
that would compress the data being written to the FDT file (which is used to 
store fields).  It was a bit of a hack to implement, in the latest version of 
Lucene this hack is no longer necessary.  Flexible indexing in Lucene 4 allows 
us to implement our own Codec for storing all information.  This task is to 
re-implement the CompressedFieldDirectory as an extension of AppendingCodec.

So in my previous comment I spoke of using a built-in data structure for 
storing this information, like a SequenceFile.  If we were to use a 
SequenceFile, we would need to create a index file for the SequenceFile, let me 
explain.  Documents are accessed by document id (0 up integer) per segment.  If 
we store the document as the value and the document id as the key for each 
key/value pair in the SequenceFile then we would would get the RECORD or BLOCK 
storage for free.  However, finding a document by id would require a scan of 
the file which would be very expensive.  So along with using a SequenceFile we 
will need a second file for find the location of the key/value pair in the 
SequenceFile, hence the "index" for the SequenceFile.


                
> Extend lucene 4 AppendingCodec and add a compression option for the field 
> storage.
> ----------------------------------------------------------------------------------
>
>                 Key: BLUR-30
>                 URL: https://issues.apache.org/jira/browse/BLUR-30
>             Project: Apache Blur
>          Issue Type: Improvement
>            Reporter: Aaron McCurry
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (BLUR-30) Extend lucene 4 AppendingCodec and add a compression option for the field storage.

Reply via email to