[
https://issues.apache.org/jira/browse/PIG-4784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15111798#comment-15111798
]
liyunzhang_intel commented on PIG-4784:
---------------------------------------
In mr mode, there is hadoop
api(https://hadoop.apache.org/docs/r1.2.1/api/org/apache/hadoop/mapred/Task.Counter.html)
to calculate the MAP_INPUT_RECORDS and REDUCE_OUTPUT_RECORDS. But when in
multiple inputs and outputs case, there is no hadoop api to calculate the
MAP_INPUT_RECORDS and REDUCE_OUTPUT_RECORDS of each file.
When there are multiple inputs, in mr mode, pig counts once reading each
record(https://github.com/apache/pig/blob/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigRecordReader.java#L148)
of an input file.
When there are multiple outputs, in mr mode, pig counts once getting the result
of
POStore(https://github.com/apache/pig/blob/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POStore.java#L170).
So in mr mode, "pig.disable.counter" is only suitable for for multiple inputs
and multiple outputs case.
In spark mode, there is no spark api to calculate the input and output records
of single input and output. In PIG-4655 and PIG-4634 we implemented counter. So
in spark mode, whether in single or multiple inputs, the counter will be
disabled and
the record number of input and output is always -1 when pig.disable.counter is
true.
> Enable "pig.disable.counter“ for spark engine
> ---------------------------------------------
>
> Key: PIG-4784
> URL: https://issues.apache.org/jira/browse/PIG-4784
> Project: Pig
> Issue Type: Sub-task
> Components: spark
> Reporter: liyunzhang_intel
> Assignee: liyunzhang_intel
> Fix For: spark-branch
>
> Attachments: PIG-4784.patch
>
>
> When you enable pig.disable.counter as "true" in the conf/pig.properties, the
> counter to calculate the number of input records and output records will be
> disabled.
> Following unit tests are designed to test it but now they fail:
> org.apache.pig.test.TestPigRunner#testDisablePigCounters
> org.apache.pig.test.TestPigRunner#testDisablePigCounters2
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)