[
https://issues.apache.org/jira/browse/PIG-4757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rohini Palaniswamy updated PIG-4757:
------------------------------------
Attachment: PIG-4757-1.patch
bq. Another thing to fix is for hbase inputs, there is no bytes read with Tez,
but it is displayed in MR.
This is invalid. Pig displays HDFS_BYTES_WRITTEN in MR for hbase inputs which
is incorrect. Not displaying anything is better in Tez.
HBase Counters
BYTES_IN_REMOTE_RESULTS 166228275 0 166228275
BYTES_IN_RESULTS 166228275 0 166228275
Displaying bytes in above counter would make sense, but that also could be
incorrect if any UDF also accessed hbase in the plan. So leaving it as is for
now.
> Job stats on successfully read/output records wrong with multiple
> inputs/outputs
> --------------------------------------------------------------------------------
>
> Key: PIG-4757
> URL: https://issues.apache.org/jira/browse/PIG-4757
> Project: Pig
> Issue Type: Bug
> Components: tez
> Reporter: Rohini Palaniswamy
> Assignee: Daniel Dai
> Fix For: 0.16.0
>
> Attachments: PIG-4757-1.patch
>
>
> TezVertexStats uses TaskCounter.INPUT_RECORDS_PROCESSED to display records
> read from MRInput. But in cases of replicate join or scalar it also includes
> replicate join input. Need to have a pig specific counter
> (MULTI_INPUTS_RECORD_COUNTER) in POSimpleTezLoad.
> TezVertexStats uses TaskCounter.OUTPUT_RECORDS to display records stored to
> MROutput if there is single store. If there are multiple stores it uses
> MULTI_STORE_RECORD_COUNTER and there are no issues. If there is a single
> store with another output, then value from OUTPUT_RECORDS is wrong. Need to
> use MULTI_STORE_RECORD_COUNTER for all cases even if there is no multiple
> store.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)