[ 
https://issues.apache.org/jira/browse/PIG-4655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14723181#comment-14723181
 ] 

Xianda Ke commented on PIG-4655:
--------------------------------

Hi [~mohitsabharwal], Thanks for your comments.
1. The members declaration is moved to the top of the class. Thanks.

2. addInputInfoForSparkOper() is a helper function, which will call 
SparkJobStats.addInputStats()
For each POStore, we start a job and then create a SparkJobStats to collect the 
I/O statistics. When a SparkOperator has multiple POStores, we create multiple 
SparkJobStats. But the input info (POLoads) of a SparkOperator should be 
collected only once. To avoid the input info was collected repeatedly, we need 
a SparkOperator Set to indicate whether the input info of the SparkOperator has 
already been computed. I think it better to put this Set in SparkPigStats. 
That's why I created the helper function addInputInfoForSparkOper() and didn't 
put it in class SparkJobStats.  Any comments?

Thanks,
Xianda

> Support InputStats in spark mode
> --------------------------------
>
>                 Key: PIG-4655
>                 URL: https://issues.apache.org/jira/browse/PIG-4655
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: Xianda Ke
>            Assignee: Xianda Ke
>             Fix For: spark-branch
>
>         Attachments: PIG-4655.patch
>
>
> Currently, InputStats is not implemented in spark mode. 
> The JUnit case TestPigRunner.testEmptyFileCounter() will fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to