[
https://issues.apache.org/jira/browse/BLUR-290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13822475#comment-13822475
]
Ravikumar commented on BLUR-290:
--------------------------------
I had a slightly different take on this problem at least for point 1, which is
"1. Same Row is spread-out across segments contiguously in disk index."
I came up with a sample codec which does the following
1. During flush of the segment, record start-doc for each rowId.
Ex: Lets say each row contains 500 records
RowID0 = 0-499
RowID1 = 500-999
RowID2 = 1000-1499 etc...
a. A BitSet marking each start-doc-id [0,500,1000 etc..] as true, similar to
PrimeDocCache, but per-segment
b. Apart from the BitSet, we also record the RowID for each start-doc-id in
a "Row" file
[0 --> RowID0, 500 --> RowID1, 1000 --> RowID2 etc...]
c. An index-file of every 128th rowId
[0 --> RowID0, 64000(128*500) --> RowID128 etc...]
I have copied CompressingStoredFields logic in lucene for Step a&b, with minor
tweaks. This will result in exactly one disk seek for locating the rowId of a
given docId.
2. When scoring happens for a given docId, a single additional disk seek is
also costly. So I have fronted this with a LRU cache
Ex: getRowIdForDoc(docId), where docId is obtained from Scorer.
int startDoc = bitset.prevSetBit(docId); //Give the start-doc-id
associated with this docId
BytesRef ref = rowCache.get(startDoc);
if(ref != null) {
return ref;
}
else {
1. Perform binary-search in index-file
2. Do one disk seek to locate the start-doc/RowId for the incoming docId
3. Add it to rowCache
}
> NRT Updates using RAMDirectory & Swap
> -------------------------------------
>
> Key: BLUR-290
> URL: https://issues.apache.org/jira/browse/BLUR-290
> Project: Apache Blur
> Issue Type: New Feature
> Affects Versions: experimental-dev
> Reporter: Ravikumar
> Attachments: BlurFlushingIndexWriter.java, BlurIndexTracker.java,
> BlurRealTimeIndex.java, BlurRealTimeIndexWriter.java,
> BlurRealTimeManager.java, BlurRealTimeManagerReopenThread.java,
> RealTimeTransactionRecorder.java, SlabAllocator.java, SlabRAMDirectory.java,
> SlabRAMFile.java, SlabRAMInputStream.java, SlabRAMOutputStream.java,
> SortingMultiReader.java
>
>
> We have been discussing about handling humungous rows in Blur (BLUR-220).
> Explore the idea of using RAMDirectory at the front, backed by
> persistent-index.
--
This message was sent by Atlassian JIRA
(v6.1#6144)