[
https://issues.apache.org/jira/browse/PIG-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13083882#comment-13083882
]
Dmitriy V. Ryaboy commented on PIG-2208:
----------------------------------------
This is just trading one issue for another. If we use too many counters, the
job is killed by limits. If we don't, we spam the logs and the tasks are killed
for using too much local disk. We should at least do local aggregation -- keep
counters local to task (a simple map), and log what we would otherwise put in
counters.
> Restrict number of PIG generated Haddop counters
> -------------------------------------------------
>
> Key: PIG-2208
> URL: https://issues.apache.org/jira/browse/PIG-2208
> Project: Pig
> Issue Type: Bug
> Components: impl
> Affects Versions: 0.8.1, 0.9.0
> Reporter: Richard Ding
> Assignee: Richard Ding
> Fix For: 0.9.1
>
> Attachments: PIG-2208.patch
>
>
> PIG 8.0 implemented Hadoop counters to track the number of records read for
> each input and the number of records written for each output (PIG-1389 &
> PIG-1299). On the other hand, Hadoop has imposed limit on per job counters
> (MAPREDUCE-1943) and jobs will fail if the counters exceed the limit.
> Therefore we need a way to cap the number of PIG generated counters.
> Here are the two options:
> 1. Add a integer property (e.g., pig.counter.limit) to the pig property file
> (e.g., 20). If the number of inputs of a job exceeds this number, the input
> counters are disabled. Similarly, if the number of outputs of a job exceeds
> this number, the output counters are disabled.
> 2. Add a boolean property (e.g., pig.disable.counters) to the pig property
> file (default: false). If this property is set to true, then the PIG
> generated counters are disabled.
>
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira