[jira] [Commented] (BLUR-290) NRT Updates using RAMDirectory & Swap

Ravikumar (JIRA) Thu, 14 Nov 2013 06:28:32 -0800

    [ 
https://issues.apache.org/jira/browse/BLUR-290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13822475#comment-13822475
 ]


Ravikumar commented on BLUR-290:
--------------------------------

I had a slightly different take on this problem at least for point 1, which is

"1. Same Row is spread-out across segments contiguously in disk index."

I came up with a sample codec which does the following

1. During flush of the segment, record start-doc for each rowId. 
    Ex: Lets say each row contains 500 records
            RowID0 = 0-499
            RowID1 = 500-999
            RowID2 = 1000-1499 etc...
   a. A BitSet marking each start-doc-id [0,500,1000 etc..] as true, similar to 
PrimeDocCache, but per-segment
   b. Apart from the BitSet, we also record the RowID for each start-doc-id in 
a "Row" file
        [0 --> RowID0, 500 --> RowID1, 1000 --> RowID2 etc...] 
   c. An index-file of every 128th rowId
        [0 --> RowID0, 64000(128*500) --> RowID128 etc...]

I have copied CompressingStoredFields logic in lucene for Step a&b, with minor 
tweaks. This will result in exactly one disk seek for locating the rowId of a 
given docId.

2. When scoring happens for a given docId, a single additional disk seek is 
also costly. So I have fronted this with a LRU cache
     Ex: getRowIdForDoc(docId), where docId is obtained from Scorer.
     
     int startDoc = bitset.prevSetBit(docId); //Give the start-doc-id 
associated with this docId
     BytesRef ref = rowCache.get(startDoc);
     if(ref != null) {
        return ref;
     } 
     else {
       1. Perform binary-search in index-file
       2. Do one disk seek to locate the start-doc/RowId for the incoming docId
       3. Add it to rowCache
     }

> NRT Updates using RAMDirectory & Swap
> -------------------------------------
>
>                 Key: BLUR-290
>                 URL: https://issues.apache.org/jira/browse/BLUR-290
>             Project: Apache Blur
>          Issue Type: New Feature
>    Affects Versions: experimental-dev
>            Reporter: Ravikumar
>         Attachments: BlurFlushingIndexWriter.java, BlurIndexTracker.java, 
> BlurRealTimeIndex.java, BlurRealTimeIndexWriter.java, 
> BlurRealTimeManager.java, BlurRealTimeManagerReopenThread.java, 
> RealTimeTransactionRecorder.java, SlabAllocator.java, SlabRAMDirectory.java, 
> SlabRAMFile.java, SlabRAMInputStream.java, SlabRAMOutputStream.java, 
> SortingMultiReader.java
>
>
> We have been discussing about handling humungous rows in Blur (BLUR-220). 
> Explore the idea of using RAMDirectory at the front, backed by 
> persistent-index.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (BLUR-290) NRT Updates using RAMDirectory & Swap

Reply via email to