[ 
https://issues.apache.org/jira/browse/PIG-4634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14693038#comment-14693038
 ] 

kexianda commented on PIG-4634:
-------------------------------

Hi [~mohitsabharwal] & [~xuefuz],
PIG-4634-3.patch is attached.  Would you please help review the code. 

1. Implement records count logic using SparkCounter
(a). SparkPigStatusReporter.java:  a singleton factory to get sparkcounters.
(b). Create a new SparkCounter in StoreConverter.convert(). And increase the 
counter in FromTupleFunction.
We append the key of store operator to the counter name (in 
SparkStatsUtil.getStoreSparkCOunterName()), to avoid the counter name conflict 
when output file have the same shortname(say, /tmp1/output & /tmp2/output).

2. some slight changes/fix:
(a).set pigContext when initializing SparkPigStats.
(b).getOutputAlias() in spark mode


How to test:
Run TestPigRunner.simpleTest()

> Fix records count issues in output statistics
> ---------------------------------------------
>
>                 Key: PIG-4634
>                 URL: https://issues.apache.org/jira/browse/PIG-4634
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: kexianda
>            Assignee: kexianda
>             Fix For: spark-branch
>
>         Attachments: PIG-4634-3.patch, PIG-4634.patch, PIG-4634_2.patch
>
>
> Test cases simpleTest() and simpleTest2()  in TestPigRunner failed, caused by 
> following issues:
> 1. pig context in SparkPigStats isn't initialized.
> 2. the records count logic hasn't been implemented.
> 3. getOutpugAlias(), getPigProperties(), getBytesWritten() and 
> getRecordWritten() have not been implemented.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to