[ 
https://issues.apache.org/jira/browse/PIG-3891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15385858#comment-15385858
 ] 

Nandor Kollar commented on PIG-3891:
------------------------------------

The patch for FileBasedOutputSizeReader looks good for me, but I think a new 
test case is required in TestMRJobStats to test this case not just in 
piggybank. I attached a 3rd version of this patch with a new test case with and 
changed the MultiStore test case too: it seems that FileBasedOutputSizeReader 
is used when the script is executed in batch mode, and it stores the result in 
multiple stores (using MultiStores in subdirectories), and not just in one (for 
just one store command, the mapreduce counters are taken into account).

[~rohini] wouldn't testGetOutputSizeUsingFileBasedStorage in TestMRJobStats 
test the filesize if it is a file and not a path? With the patch applied, this 
test is green.

> FileBasedOutputSizeReader does not calculate size of files in sub-directories
> -----------------------------------------------------------------------------
>
>                 Key: PIG-3891
>                 URL: https://issues.apache.org/jira/browse/PIG-3891
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.12.0
>            Reporter: Rohini Palaniswamy
>            Assignee: Nandor Kollar
>         Attachments: PIG-3891-1.patch, PIG-3891-2.patch, PIG-3891-3.patch
>
>
> FileBasedOutputSizeReader only includes files in the top level output 
> directory. So if files are stored under subdirectories (For eg: 
> MultiStorage), it does not have the bytes written correctly. 
> 0.11 shows the correct number of total bytes written and this is a 
> regression. A quick look at the code shows that the 
> JobStats.addOneOutputStats() in 0.11 also does not recursively iterate and 
> code is same as  FileBasedOutputSizeReader. Need to investigate where the 
> correct value comes from in 0.11 and fix it in 0.12.1/0.13.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to