[
https://issues.apache.org/jira/browse/HADOOP-5040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12665200#action_12665200
]
Eric Yang commented on HADOOP-5040:
-----------------------------------
A chukwa record contains both key and value hashes. The short term goal is to
index by key, and the long term goal is to be able to generate full body index
on the value hashes. The current design of demux is to create multiple of
spill files, if the same time partition already has data existed. In order to
search through the time partition, the index of multiple spill files need to be
merged to provide a linear view of the time line. At the same time, the hourly
roll up or daily roll up and reduce the number of files on disk. This means
the indexing system could either rewrite the index multiple times, or having a
time leased mechanism for indexed keys.
Rewriting index multiple times only works for small set of data because scan
time for chukwa records grows linearly. Once the data reach peta bytes, then
it makes more sense to have a time leased index where each part of the index
could expire and remerge more easily.
By using KATTA, it may be possible to have the linear time index partitioned
and updated on multiple server, and multi-cast search will broadcast to all
index and retrieve the result more efficiently.
> Need index for chukwa sequence files
> ------------------------------------
>
> Key: HADOOP-5040
> URL: https://issues.apache.org/jira/browse/HADOOP-5040
> Project: Hadoop Core
> Issue Type: New Feature
> Components: contrib/chukwa
> Environment: Redhat EL 5.1 and Java 6
> Reporter: Eric Yang
> Assignee: Eric Yang
>
> Chukwa has ability to collect large volume of data, but the lack of index
> prevents Chukwa front end to serve data straight from HDFS. This jira is the
> place holder for designing a indexing service for Chukwa. The plan is to
> create indexing service base on available software like lucene or katta.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.