[
https://issues.apache.org/jira/browse/BLUR-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792628#comment-13792628
]
Ravikumar Govindarajan commented on BLUR-220:
---------------------------------------------
http://www.mail-archive.com/[email protected]/msg00141.html
As an alternative to reading humungous rows, it would be helpful for Blur to
scatter incoming records across segments, rather than co-locate all records for
a row every-time.
As discussed in the link above, a BlurSortingMergePolicy can be used to
aggregate all records by RowId order during a segment merge and offset perf
loss to a good extent.
Apps can make use of this by extending BlurSortingMergePolicy
Ex: Sort by RowID+Doc-Creation-Time
This will allow apps to do early query termination per-sorted-segment per-row
etc...
> Support for humongous Rows
> --------------------------
>
> Key: BLUR-220
> URL: https://issues.apache.org/jira/browse/BLUR-220
> Project: Apache Blur
> Issue Type: Improvement
> Components: Blur
> Affects Versions: 0.3.0
> Reporter: Aaron McCurry
> Fix For: 0.3.0
>
>
> One of the limitations of Blur is size of Rows stored, specifically the
> number of Records. The current updates are performed on Lucene is by
> deleting the document and re-adding to the index. Unfortunately when any
> update is perform on a Row in Blur, the entire Row has to be re-read (if the
> RowMutationType is UPDATE_ROW) and then whatever modification needs are made
> then it is reindexed in it's entirety.
> Due to all of this overhead, there is a realistic limit on the size of a
> given Row. It may vary based the kind of hardware that is being used, as the
> Row grows in size the indexing (mutations) against that Row will slow.
> This issue is being created to discuss techniques on how to deal with this
> problem.
--
This message was sent by Atlassian JIRA
(v6.1#6144)