[
https://issues.apache.org/jira/browse/PIG-4634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14693038#comment-14693038
]
kexianda commented on PIG-4634:
-------------------------------
Hi [~mohitsabharwal] & [~xuefuz],
PIG-4634-3.patch is attached. Would you please help review the code.
1. Implement records count logic using SparkCounter
(a). SparkPigStatusReporter.java: a singleton factory to get sparkcounters.
(b). Create a new SparkCounter in StoreConverter.convert(). And increase the
counter in FromTupleFunction.
We append the key of store operator to the counter name (in
SparkStatsUtil.getStoreSparkCOunterName()), to avoid the counter name conflict
when output file have the same shortname(say, /tmp1/output & /tmp2/output).
2. some slight changes/fix:
(a).set pigContext when initializing SparkPigStats.
(b).getOutputAlias() in spark mode
How to test:
Run TestPigRunner.simpleTest()
> Fix records count issues in output statistics
> ---------------------------------------------
>
> Key: PIG-4634
> URL: https://issues.apache.org/jira/browse/PIG-4634
> Project: Pig
> Issue Type: Sub-task
> Components: spark
> Reporter: kexianda
> Assignee: kexianda
> Fix For: spark-branch
>
> Attachments: PIG-4634-3.patch, PIG-4634.patch, PIG-4634_2.patch
>
>
> Test cases simpleTest() and simpleTest2() in TestPigRunner failed, caused by
> following issues:
> 1. pig context in SparkPigStats isn't initialized.
> 2. the records count logic hasn't been implemented.
> 3. getOutpugAlias(), getPigProperties(), getBytesWritten() and
> getRecordWritten() have not been implemented.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)