[ 
https://issues.apache.org/jira/browse/CHUKWA-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764748#action_12764748
 ] 

Eric Yang commented on CHUKWA-22:
---------------------------------

Chukwa's demux processor already ordered the data, hence 95% of the time, it 
should be sequential write to hbase.  My test machines also have 16GB of RAM.  
Hence, I am not seeing the memory and throughput problems yet.  Maybe my 
dataset is too small when writing to hbase.  The paper is a interesting read. 
Thanks for sharing.  I am open to suggestion on indexing chukwa data.  Perhaps, 
the data could be managed using tfiles, yet this would make chukwa to repeat a 
lot of work from hbase.  That is something that I would like to avoid.
Something to think about.

> Need index for chukwa sequence files
> ------------------------------------
>
>                 Key: CHUKWA-22
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-22
>             Project: Hadoop Chukwa
>          Issue Type: New Feature
>          Components: Data Processors
>         Environment: Redhat EL 5.1 and Java 6
>            Reporter: Eric Yang
>            Assignee: Eric Yang
>
> Chukwa has ability to collect large volume of data, but the lack of index 
> prevents Chukwa front end to serve data straight from HDFS.  This jira is the 
> place holder for designing a indexing service for Chukwa.  The plan is to 
> create indexing service base on available software like lucene or katta.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to