[ 
https://issues.apache.org/jira/browse/HADOOP-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652767#action_12652767
 ] 

Sharad Agarwal commented on HADOOP-4718:
----------------------------------------

Skipping bad records feature need a way to get a callback for the number of 
processed records from streaming process. To support this, counters were chosen 
as that is supported by both pipes and streaming 
->https://issues.apache.org/jira/browse/HADOOP-153?focusedCommentId=12610897#action_12610897
 (last point)

bq. In particular, if the user updates a counter with the wrong name, bad 
things will presumably happen...
I see this can only happen if user defines its own counter with the same name. 
Or is there any other problem which can happen? would it be ok for now to 
document the framework reserve counter names and perhaps log in the above loop 
that framework counter is being updated ?

Other alternative if we don't want to use counter for this at all, would be to 
add a mechanism in streaming and pipes protocol. Streaming can write to stderr 
something like processedRecords, which would be parsed by the framework. 
Similarly need to be added to Pipes protocol as well.





> incrementing counters should not be used for triggering record skipping
> -----------------------------------------------------------------------
>
>                 Key: HADOOP-4718
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4718
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Owen O'Malley
>
> The following code is really problematic:
> {code}
> public void incrCounter(String group, String counter, long amount) {
>   if (counters != null) {
>     counters.incrCounter(group, counter, amount);
>   }
>   if(skipping && SkipBadRecords.COUNTER_GROUP.equals(group) && (
>      SkipBadRecords.COUNTER_MAP_PROCESSED_RECORDS.equals(counter) ||
>      SkipBadRecords.COUNTER_REDUCE_PROCESSED_GROUPS.equals(counter))) {
>      //if application reports the processed records, move the 
>      //currentRecStartIndex to the next.
>      //currentRecStartIndex is the start index which has not yet been 
>      //finished and is still in task's stomach.
>      for(int i=0;i<amount;i++) {
>         currentRecStartIndex = currentRecIndexIterator.next();
>      }
>    ...
> }
> {code}
> In particular, if the user updates a counter with the wrong name, bad things 
> will presumably happen...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to