[jira] [Commented] (BLUR-220) Support for humongous Rows

Aaron McCurry (JIRA) Wed, 16 Oct 2013 03:14:03 -0700

    [ 
https://issues.apache.org/jira/browse/BLUR-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13796630#comment-13796630
 ]


Aaron McCurry commented on BLUR-220:
------------------------------------

Ravi,

I have been taking some time to digest your results.  I believe that in most 
cases anything less 50 ms will be acceptable.  However to have this feature 
work at the scale that Blur can currently operate at we are going to have to 
have a different approach than just the plain mixed index, obviously.

I think that we are going to have to have some sort of mixed approach where the 
index is in a normal state and another where is it in this dual pass mode.  The 
biggest problem I see to overcome with this approach is how to get the entire 
row back together again during merges, when the row is spread across segments 
and we don't want to have to do a full optimization (1 segment).

I will do some more thinking on this one.

Aaron

> Support for humongous Rows
> --------------------------
>
>                 Key: BLUR-220
>                 URL: https://issues.apache.org/jira/browse/BLUR-220
>             Project: Apache Blur
>          Issue Type: Improvement
>          Components: Blur
>    Affects Versions: 0.3.0
>            Reporter: Aaron McCurry
>             Fix For: 0.3.0
>
>         Attachments: Blur_Query_Perf_Chart1.pdf, CreateIndex.java, 
> CreateIndex.java, CreateSortedIndex.java, MyEarlyTerminatingCollector.java, 
> test_results.txt, TestSearch.java, TestSearch.java
>
>
> One of the limitations of Blur is size of Rows stored, specifically the 
> number of Records.  The current updates are performed on Lucene is by 
> deleting the document and re-adding to the index.  Unfortunately when any 
> update is perform on a Row in Blur, the entire Row has to be re-read (if the 
> RowMutationType is UPDATE_ROW) and then whatever modification needs are made 
> then it is reindexed in it's entirety.
> Due to all of this overhead, there is a realistic limit on the size of a 
> given Row.  It may vary based the kind of hardware that is being used, as the 
> Row grows in size the indexing (mutations) against that Row will slow.
> This issue is being created to discuss techniques on how to deal with this 
> problem.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (BLUR-220) Support for humongous Rows

Reply via email to