[ 
https://issues.apache.org/jira/browse/PIG-5355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16620669#comment-16620669
 ] 

Koji Noguchi commented on PIG-5355:
-----------------------------------

{quote}
Outside of this jira, I still don't like the logic of 
{{HBaseTableInputFormat.getProgress()}}
{code:java}
if (bigLastRow.compareTo(bigEnd_) > 0) {
  return progressSoFar_;
}
{code}
which means when records have longer key length than 
{{max(startRow_.length,endRow_.length)}}, progress stays the same.
{quote}
[~satishsaley], [~rohini], how about we truncate (by calling Bytes.head) when  
{{maxRowLength < currRow_.length}} ?

Or, I'm fine committing as is.  Most important of the patch is avoiding the 
negative progress report.

> Negative progress report by HBaseTableRecordReader
> --------------------------------------------------
>
>                 Key: PIG-5355
>                 URL: https://issues.apache.org/jira/browse/PIG-5355
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Satish Subhashrao Saley
>            Assignee: Satish Subhashrao Saley
>            Priority: Major
>         Attachments: PIG-5355-1.patch, PIG-5355-2.patch, PIG-5355-3.patch
>
>
> The logic for padding the current row does not consider the updated padded 
> row during the comparison. It ends up with different length then expected. 
> This results in negative value for {{processed}}.
> {code}
>             byte[] lastPadded = currRow_;
>             if (currRow_.length < endRow_.length) {
>                 lastPadded = Bytes.padTail(currRow_, endRow_.length - 
> currRow_.length);
>             }
>             if (currRow_.length < startRow_.length) {
>                 lastPadded = Bytes.padTail(currRow_, startRow_.length - 
> currRow_.length);
>             }
>             byte [] prependHeader = {1, 0};
>             BigInteger bigLastRow = new BigInteger(Bytes.add(prependHeader, 
> lastPadded));
>             if (bigLastRow.compareTo(bigEnd_) > 0) {
>                 return progressSoFar_;
>             }
>             BigDecimal processed = new 
> BigDecimal(bigLastRow.subtract(bigStart_));
> {code}
> The fix is to use {{lastPadded}} in the second {{if}} comparison and 
> {{Bytes.padTail}} call inside that {{if}}
> PIG-4700 added progress reporting. This enabled ProgressHelper in Tez. It 
> calls {{getProgress}} [here 
> |https://github.com/apache/tez/blob/master/tez-api/src/main/java/org/apache/tez/common/ProgressHelper.java#L50]
>  on {{PigRecrodReader}} 
> https://github.com/apache/pig/blob/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigRecordReader.java#L159
>  . Since Pig is reporting negative progress, job is getting killed by AM.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to