[jira] [Updated] (BLUR-220) Support for humongous Rows

Ravikumar (JIRA) Thu, 24 Oct 2013 06:52:37 -0700

     [ 
https://issues.apache.org/jira/browse/BLUR-220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ravikumar updated BLUR-220:
---------------------------

    Attachment: SlabRAMFile.java
                SlabRAMInputStream.java
                SlabAllocator.java
                SlabRAMOutputStream.java
                SlabRAMDirectory.java

Ok, somethings that I gathered.

If we are to use a RAM-based directory, then it's definitely not going to be a 
RAMDirectory. Even the javadocs has warnings!!!

I quickly grabbed the SlabAllocator from Cassandra [Which is again grabbed from 
HBase], that doles out 1 MB byte[] and wrap it up with lucene's BytesRef. Each 
RAMFile contains N-chunks of BytesRef with chunk-size=64KB. 

I believe it should be both friendly on GC-cycles for few GB's of RAM as well 
as quite performant under concurrent loads. Patch attached.

In Blur code, I see everywhere a "waitTobeVisible" flag, instructing NRTManager 
to wait till that generation. How should I understand that in the context of a 
RAMDirectory, backed by a HDFSDirectory? What should be the correct way to 
approach this?

> Support for humongous Rows
> --------------------------
>
>                 Key: BLUR-220
>                 URL: https://issues.apache.org/jira/browse/BLUR-220
>             Project: Apache Blur
>          Issue Type: Improvement
>          Components: Blur
>    Affects Versions: 0.3.0
>            Reporter: Aaron McCurry
>             Fix For: 0.3.0
>
>         Attachments: Blur_Query_Perf_Chart1.pdf, CreateIndex.java, 
> CreateIndex.java, CreateSortedIndex.java, FullRowReindexing.java, 
> MyEarlyTerminatingCollector.java, SlabAllocator.java, SlabRAMDirectory.java, 
> SlabRAMFile.java, SlabRAMInputStream.java, SlabRAMOutputStream.java, 
> test_results.txt, TestSearch.java, TestSearch.java
>
>
> One of the limitations of Blur is size of Rows stored, specifically the 
> number of Records.  The current updates are performed on Lucene is by 
> deleting the document and re-adding to the index.  Unfortunately when any 
> update is perform on a Row in Blur, the entire Row has to be re-read (if the 
> RowMutationType is UPDATE_ROW) and then whatever modification needs are made 
> then it is reindexed in it's entirety.
> Due to all of this overhead, there is a realistic limit on the size of a 
> given Row.  It may vary based the kind of hardware that is being used, as the 
> Row grows in size the indexing (mutations) against that Row will slow.
> This issue is being created to discuss techniques on how to deal with this 
> problem.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (BLUR-220) Support for humongous Rows

Reply via email to