[ 
https://issues.apache.org/jira/browse/PIG-3891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15387186#comment-15387186
 ] 

Rohini Palaniswamy commented on PIG-3891:
-----------------------------------------

Few comments:
  - Could you fix this to run in Tez mode as well? Did not realize this one was 
not fixed.
{code}
pigServer = new PigServer(cluster.getExecType(), cluster.getProperties());
     pigServerLocal = new PigServer(ExecType.LOCAL);
{code}

to 

{code}
pigServer = new PigServer(ExecType.MAPREDUCE, cluster.getProperties());
     pigServerLocal = new PigServer(Util.getLocalTestMode());
{code}

 It would also involve changes to the test like MRJobStats->JobStats, etc. You 
can test by running with ant test -Dhadoopversion=23 -Dexectype=tez 
-Dtestcase=TestMultiStorage

- Can you add asserts for getMultiStoreCounters() as well for the individual 
output bytes written
- Test name is too verbose. Could you rename the test as just testOutputStats 
and add a comment in the beginning of the test saying 
//Test if bytes written is correct with sub-directories and multiple 
MultiStorage statements.

Rest looks good.



> FileBasedOutputSizeReader does not calculate size of files in sub-directories
> -----------------------------------------------------------------------------
>
>                 Key: PIG-3891
>                 URL: https://issues.apache.org/jira/browse/PIG-3891
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.12.0
>            Reporter: Rohini Palaniswamy
>            Assignee: Nandor Kollar
>         Attachments: PIG-3891-1.patch, PIG-3891-2.patch, PIG-3891-3.patch
>
>
> FileBasedOutputSizeReader only includes files in the top level output 
> directory. So if files are stored under subdirectories (For eg: 
> MultiStorage), it does not have the bytes written correctly. 
> 0.11 shows the correct number of total bytes written and this is a 
> regression. A quick look at the code shows that the 
> JobStats.addOneOutputStats() in 0.11 also does not recursively iterate and 
> code is same as  FileBasedOutputSizeReader. Need to investigate where the 
> correct value comes from in 0.11 and fix it in 0.12.1/0.13.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to