Eric Yang created CHUKWA-744:
--------------------------------
Summary: Refactor ETL process for HBaseWriter
Key: CHUKWA-744
URL: https://issues.apache.org/jira/browse/CHUKWA-744
Project: Chukwa
Issue Type: Task
Components: Data Processors
Affects Versions: 0.6.0
Reporter: Eric Yang
Assignee: Eric Yang
The current ETL classes are based on Demux MapProcessor and ReduceProcessor.
The processors were designed to pass in archive key embedded in the processor
as well as ChunkSaver to preserve chunks that can not be parsed. This is fine
when running map reduce based demux job for processing data. The short lived
task will spill out ChunkSaver into separate file for examination later.
However, the processors can generate memory leaks for long period of time in
Chukwa agent because Chunks are saved in ChukwaSaver without clean up.
It would be better to redesign the parser classes with well defined behavior.
If the chunk can not be parsed, it should throw ParseException to upper layer
for retry or log to agent log.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)