[
https://issues.apache.org/jira/browse/BLUR-290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13818923#comment-13818923
]
Ravikumar commented on BLUR-290:
--------------------------------
On the read-path, I saw current-code in SuperQuery.java where scoring is done
per-row as per PrimeDocCache BitSet and is wrapped with a TopScoreDocsCollector
in IterablePaging.java. Please correct if I am wrong
Situation is slightly different here.
1. Same Row is spread-out across segments contiguously in disk index.
2. Same Row can also be scattered across segments non-contiguous in "N" RAM
indexes.
If there is a query with "docs.content=hello AND rowid:123" etc..., then this
will be a straight-forward impl.
But if there is a query just with "docs.content=hello", then this is going to
be very difficult to aggregate all records across segments for a given row and
compute a correct score.
I can think of the newly introduced Grouping functionality in lucene, where we
can group by "rowid" but that is extremely costly
1. It involves FieldCache
2. There are 2 round-trips, one for identifying Top "N" rows & another for
identifying Top "M" records for each of the "N" rows
Need some help here.
Or may be for a start, we can also choose to not support queries without a
"row-id", when using this real-time system. [Something akin to key-value store,
where query without a key is not possible]
> NRT Updates using RAMDirectory & Swap
> -------------------------------------
>
> Key: BLUR-290
> URL: https://issues.apache.org/jira/browse/BLUR-290
> Project: Apache Blur
> Issue Type: New Feature
> Affects Versions: experimental-dev
> Reporter: Ravikumar
> Attachments: BlurFlushingIndexWriter.java, BlurIndexTracker.java,
> BlurRealTimeIndex.java, BlurRealTimeIndexWriter.java,
> BlurRealTimeManager.java, BlurRealTimeManagerReopenThread.java,
> RealTimeTransactionRecorder.java, SlabAllocator.java, SlabRAMDirectory.java,
> SlabRAMFile.java, SlabRAMInputStream.java, SlabRAMOutputStream.java,
> SortingMultiReader.java
>
>
> We have been discussing about handling humungous rows in Blur (BLUR-220).
> Explore the idea of using RAMDirectory at the front, backed by
> persistent-index.
--
This message was sent by Atlassian JIRA
(v6.1#6144)