[
https://issues.apache.org/jira/browse/CHUKWA-155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700938#action_12700938
]
Jerome Boulon commented on CHUKWA-155:
--------------------------------------
+1 on asking Hadoop team to add time stamp since we want to do some time based
analytic.
Demux is able to deal with any kind of data but if there's some rules.
It's the parser responsibility to provide
- provide a time stamp, if any, use the default one provided by the Collector
at the Chunk level
- a key that will group information together according to the data usage
Regarding the case where the data does not contain any time stamp the system
will do a best effort to partition the data based on collector time stamp but
the parser could/should guarantee the order by specifying a key that contains
the SeqId + offset within the same chunk.
> Job History status arrive out of order causing the status to update
> incorrectly.
> --------------------------------------------------------------------------------
>
> Key: CHUKWA-155
> URL: https://issues.apache.org/jira/browse/CHUKWA-155
> Project: Hadoop Chukwa
> Issue Type: Bug
> Components: data collection, Data Processors
> Environment: Redhat 5.1, Java 6
> Reporter: Eric Yang
> Assignee: Jerome Boulon
> Priority: Critical
>
> Job history contains lines like:
> Job JOBID="job_200903310541_1747" JOB_STATUS="RUNNING" .
> ...
> Job JOBID="job_200903310541_1747" FINISH_TIME="1238542231308"
> JOB_STATUS="SUCCESS" FINISHED_MAPS="1338" FINISHED_REDUCES="760"
> FAILED_MAPS="78" FAILED_REDUCES="43" COUNTERS="..." .
> When pushing the data through collectors and demux, the data can arrive out
> of order. The database is updated with status "RUNNING" instead of
> "SUCCESS".
> Chukwa Sequence ID can be used to sort out of order data before the data is
> pumped to database.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.