[jira] [Commented] (BLUR-220) Support for humongous Rows

Aaron McCurry (JIRA) Thu, 17 Oct 2013 05:31:40 -0700

    [ 
https://issues.apache.org/jira/browse/BLUR-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13797839#comment-13797839
 ]


Aaron McCurry commented on BLUR-220:
------------------------------------

A row query is querying all the records within a single row.  So it is where 
all the rowids are equal to one another.

And yes the requirement for them being back-to-back is strictly for performance.

https://git-wip-us.apache.org/repos/asf?p=incubator-blur.git;a=blob;f=blur-query/src/main/java/org/apache/blur/lucene/search/SuperQuery.java;h=6939bd68e890e33b1e812769817e91837b502a17;hb=515b09a002cbbb67c1ed22af90303a5f69135eb0

I haven't had time to look at your code, will try to tonight.  By the sound of 
it you are heading down the road that I was thinking about, but I fear at scale 
the time to merge the filter segment will be very large.  So let me take a look 
and play with the code a bit.

Thanks!

Aaron

> Support for humongous Rows
> --------------------------
>
>                 Key: BLUR-220
>                 URL: https://issues.apache.org/jira/browse/BLUR-220
>             Project: Apache Blur
>          Issue Type: Improvement
>          Components: Blur
>    Affects Versions: 0.3.0
>            Reporter: Aaron McCurry
>             Fix For: 0.3.0
>
>         Attachments: Blur_Query_Perf_Chart1.pdf, CreateIndex.java, 
> CreateIndex.java, CreateSortedIndex.java, FullRowReindexing.java, 
> MyEarlyTerminatingCollector.java, test_results.txt, TestSearch.java, 
> TestSearch.java
>
>
> One of the limitations of Blur is size of Rows stored, specifically the 
> number of Records.  The current updates are performed on Lucene is by 
> deleting the document and re-adding to the index.  Unfortunately when any 
> update is perform on a Row in Blur, the entire Row has to be re-read (if the 
> RowMutationType is UPDATE_ROW) and then whatever modification needs are made 
> then it is reindexed in it's entirety.
> Due to all of this overhead, there is a realistic limit on the size of a 
> given Row.  It may vary based the kind of hardware that is being used, as the 
> Row grows in size the indexing (mutations) against that Row will slow.
> This issue is being created to discuss techniques on how to deal with this 
> problem.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (BLUR-220) Support for humongous Rows

Reply via email to