[ 
https://issues.apache.org/jira/browse/BLUR-290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13864199#comment-13864199
 ] 

Aaron McCurry commented on BLUR-290:
------------------------------------

Ravikumar,

Based on our discussion here I started investigating why the hdfs directory was 
so slow.  During some micro benchmarks I timed a single document update commit 
on a RAMDirectory to be around 1ms (precommit phase to be 0.6 ms and the final 
commit phase to be 0.08 ms).  The same test run with the HdfsDirectory was 
between 160ms - 200ms for the commit.  After some more investigation I found of 
the slowness issues were due to Hdfs meta data calls, FileStatus calls.  Also I 
found that caching the InputStreams to files helped a lot as well.

The other thing I did was to create a embedded key value store that stored data 
to hdfs (HdfsKeyValueDirectory).  After creating that I wrote a directory to 
make use of it (FastHdfsKeyValueDirectory).  Then other directory 
(JoinDirectory) to use both the classic HdfsDirectory for large term files and 
the FastHdfsKeyValueDirectory for short term files i.e. NRT updates.

The end result is a commit time in the 1-2ms range for the micro benchmark.  In 
Blur now there's no need for the WAL because everything is committed to disk on 
each mutate and the overall NRT update throughput has greatly increased.

This doesn't solve the huge row problem, so that's next on the list.  :-)

Aaron

> NRT Updates using RAMDirectory & Swap
> -------------------------------------
>
>                 Key: BLUR-290
>                 URL: https://issues.apache.org/jira/browse/BLUR-290
>             Project: Apache Blur
>          Issue Type: New Feature
>    Affects Versions: experimental-dev
>            Reporter: Ravikumar
>         Attachments: BlurFieldsConsumer.java, BlurFieldsConsumer.java, 
> BlurFieldsConsumer.java, BlurFlushingIndexWriter.java, BlurIndexTracker.java, 
> BlurPostingsConsumer.java, BlurPostingsConsumer.java, 
> BlurPostingsFormat.java, BlurPostingsFormat.java, BlurRealTimeIndex.java, 
> BlurRealTimeIndex.java, BlurRealTimeIndexTest.java, 
> BlurRealTimeIndexWriter.java, BlurRealTimeManager.java, 
> BlurRealTimeManagerReopenThread.java, BlurRowCodec.java, BlurRowCodec.java, 
> BlurSegmentInfoFormat.java, BlurSegmentInfoWriter.java, 
> BlurTermsConsumer.java, BlurTermsConsumer.java, 
> CompressingRowIndexReader.java, CompressingRowIndexWriter.java, 
> CompressingRowReader.java, CompressingRowReader.java, 
> CompressingRowReader.java, CompressingRowWriter.java, 
> CompressingRowWriter.java, CompressingRowWriter.java, 
> GrowableByteArrayDataOutput.java, PrimeDocCache.java, 
> RealTimeTransactionRecorder.java, RealTimeTransactionRecorder.java, 
> RowCache.java, RowDocsCollector.java, RowDocsCollector.java, 
> RowReaderCache.java, RowReaderCache.java, SlabAllocator.java, 
> SlabRAMDirectory.java, SlabRAMFile.java, SlabRAMInputStream.java, 
> SlabRAMOutputStream.java, SortingMultiReader.java, SortingMultiReader.java, 
> TestCompressingRowWriter.java, TestCompressingRowWriter.java
>
>
> We have been discussing about handling humungous rows in Blur (BLUR-220). 
> Explore the idea of using RAMDirectory at the front, backed by 
> persistent-index.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to