[ 
https://issues.apache.org/jira/browse/BLUR-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13796792#comment-13796792
 ] 

Ravikumar commented on BLUR-220:
--------------------------------

One thing I want to bring to your attention in the perf-chart

[no.of unique row ids,                       Slow-Query
 total no.of docs]
ID=1M, Docs = 3M                           2721 ms   [Optimized index]
                                                        4486 ms  [Merge-Sorted 
index, with early termination]

There is a 2X slowdown for the Slow-Query, which is actually a "common-term" 
search.

Another variation of the test I did, was to filter-cache the RowId before 
actually querying. [Just the bit-set cache of docids]

                                                      Slow-Query
ID=1M, Docs = 3M                           2588 ms   [Optimized index]
                                                         1335 ms  [Merge-Sorted 
index, with early termination]

The Merge-Sorted index had a 3.5X jump with this approach and even outperforms 
an optimized index

> Support for humongous Rows
> --------------------------
>
>                 Key: BLUR-220
>                 URL: https://issues.apache.org/jira/browse/BLUR-220
>             Project: Apache Blur
>          Issue Type: Improvement
>          Components: Blur
>    Affects Versions: 0.3.0
>            Reporter: Aaron McCurry
>             Fix For: 0.3.0
>
>         Attachments: Blur_Query_Perf_Chart1.pdf, CreateIndex.java, 
> CreateIndex.java, CreateSortedIndex.java, MyEarlyTerminatingCollector.java, 
> test_results.txt, TestSearch.java, TestSearch.java
>
>
> One of the limitations of Blur is size of Rows stored, specifically the 
> number of Records.  The current updates are performed on Lucene is by 
> deleting the document and re-adding to the index.  Unfortunately when any 
> update is perform on a Row in Blur, the entire Row has to be re-read (if the 
> RowMutationType is UPDATE_ROW) and then whatever modification needs are made 
> then it is reindexed in it's entirety.
> Due to all of this overhead, there is a realistic limit on the size of a 
> given Row.  It may vary based the kind of hardware that is being used, as the 
> Row grows in size the indexing (mutations) against that Row will slow.
> This issue is being created to discuss techniques on how to deal with this 
> problem.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to