[
https://issues.apache.org/jira/browse/CHUKWA-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732961#action_12732961
]
Eric Yang commented on CHUKWA-22:
---------------------------------
Building index file would not be sufficient to serve chukwa data straight from
HDFS for long term operation. The cost for keeping index in memory will
eventually require yet another distributed system to manage the index files.
Instead of reinvent the wheel, chukwa should adopt a big table like solution
like hbase to manage the data regions.
mapreduce-to-hbase example (http://wiki.apache.org/hadoop/Hbase/MapReduce)
looks like exactly what Chukwa needs. Hbase table schema for chukwa could look
like this:
Table: SystemMetrics-[TimeType]
Column Family: cpu
Column Family: memory
Column Family: disk
Column Family: temperature
Column Family: network
Column Family: default
Column Family: log
Each row represent 1 minute average, 5 minutes average, etc. This is
determined on the time type.
Example of a column could be: idle:hostname1, busy:hostname1, idle:hostname2,
busy: hostname2
log column family keeps the raw log entries for log viewing.
> Need index for chukwa sequence files
> ------------------------------------
>
> Key: CHUKWA-22
> URL: https://issues.apache.org/jira/browse/CHUKWA-22
> Project: Hadoop Chukwa
> Issue Type: New Feature
> Components: Data Processors
> Environment: Redhat EL 5.1 and Java 6
> Reporter: Eric Yang
> Assignee: Eric Yang
>
> Chukwa has ability to collect large volume of data, but the lack of index
> prevents Chukwa front end to serve data straight from HDFS. This jira is the
> place holder for designing a indexing service for Chukwa. The plan is to
> create indexing service base on available software like lucene or katta.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.