[
https://issues.apache.org/jira/browse/PIG-4655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14723181#comment-14723181
]
Xianda Ke commented on PIG-4655:
--------------------------------
Hi [~mohitsabharwal], Thanks for your comments.
1. The members declaration is moved to the top of the class. Thanks.
2. addInputInfoForSparkOper() is a helper function, which will call
SparkJobStats.addInputStats()
For each POStore, we start a job and then create a SparkJobStats to collect the
I/O statistics. When a SparkOperator has multiple POStores, we create multiple
SparkJobStats. But the input info (POLoads) of a SparkOperator should be
collected only once. To avoid the input info was collected repeatedly, we need
a SparkOperator Set to indicate whether the input info of the SparkOperator has
already been computed. I think it better to put this Set in SparkPigStats.
That's why I created the helper function addInputInfoForSparkOper() and didn't
put it in class SparkJobStats. Any comments?
Thanks,
Xianda
> Support InputStats in spark mode
> --------------------------------
>
> Key: PIG-4655
> URL: https://issues.apache.org/jira/browse/PIG-4655
> Project: Pig
> Issue Type: Sub-task
> Components: spark
> Reporter: Xianda Ke
> Assignee: Xianda Ke
> Fix For: spark-branch
>
> Attachments: PIG-4655.patch
>
>
> Currently, InputStats is not implemented in spark mode.
> The JUnit case TestPigRunner.testEmptyFileCounter() will fail.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)