[ 
https://issues.apache.org/jira/browse/BLUR-290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13818923#comment-13818923
 ] 

Ravikumar commented on BLUR-290:
--------------------------------

On the read-path, I saw current-code in SuperQuery.java where scoring is done 
per-row as per PrimeDocCache BitSet and is wrapped with a TopScoreDocsCollector 
in IterablePaging.java. Please correct if I am wrong

Situation is slightly different here. 

1. Same Row is spread-out across segments contiguously in disk index.

2. Same Row can also be scattered across segments non-contiguous in "N" RAM 
indexes.

If there is a query with "docs.content=hello AND rowid:123" etc..., then this 
will be a straight-forward impl.

But if there is a query just with "docs.content=hello", then this is going to 
be very difficult to aggregate all records across segments for a given row and 
compute a correct score.

I can think of the newly introduced Grouping functionality in lucene, where we 
can group by "rowid" but that is extremely costly

1. It involves FieldCache
2. There are 2 round-trips, one for identifying Top "N" rows & another for 
identifying Top "M" records for each of the "N" rows

Need some help here.

Or may be for a start, we can also choose to not support queries without a 
"row-id", when using this real-time system. [Something akin to key-value store, 
where query without a key is not possible] 

> NRT Updates using RAMDirectory & Swap
> -------------------------------------
>
>                 Key: BLUR-290
>                 URL: https://issues.apache.org/jira/browse/BLUR-290
>             Project: Apache Blur
>          Issue Type: New Feature
>    Affects Versions: experimental-dev
>            Reporter: Ravikumar
>         Attachments: BlurFlushingIndexWriter.java, BlurIndexTracker.java, 
> BlurRealTimeIndex.java, BlurRealTimeIndexWriter.java, 
> BlurRealTimeManager.java, BlurRealTimeManagerReopenThread.java, 
> RealTimeTransactionRecorder.java, SlabAllocator.java, SlabRAMDirectory.java, 
> SlabRAMFile.java, SlabRAMInputStream.java, SlabRAMOutputStream.java, 
> SortingMultiReader.java
>
>
> We have been discussing about handling humungous rows in Blur (BLUR-220). 
> Explore the idea of using RAMDirectory at the front, backed by 
> persistent-index.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to