[
https://issues.apache.org/jira/browse/TEZ-3459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15566278#comment-15566278
]
Kuhu Shukla edited comment on TEZ-3459 at 10/11/16 7:00 PM:
------------------------------------------------------------
I ran the jar and think I know what the issue is.
The framework and fs and other counters do show up in the task level and dag
level counters as shown below- values have been overwritten. (when in yarn-tez
mode)
{code}
[INFO] [TezChild] |runtime.LogicalIOProcessorRuntimeTask|: Final Counters for
attempt_123_0001_1_00_000000_0: Counters: 28 [[File System Counters
FILE_BYTES_READ=123, FILE_BYTES_WRITTEN=123, FILE_READ_OPS=0,
FILE_LARGE_READ_OPS=0, FILE_WRITE_OPS=0, HDFS_BYTES_READ=1234,
HDFS_BYTES_WRITTEN=0, HDFS_READ_OPS=2, HDFS_LARGE_READ_OPS=0,
HDFS_WRITE_OPS=0][org.apache.tez.common.counters.TaskCounter
SPLIT_RAW_BYTES=123, SPILLED_RECORDS=123, GC_TIME_MILLIS=123,
CPU_MILLISECONDS=123, PHYSICAL_MEMORY_BYTES=123456,
VIRTUAL_MEMORY_BYTES=123456, COMMITTED_HEAP_BYTES=12345678,
INPUT_RECORDS_PROCESSED=0, INPUT_SPLIT_LENGTH_BYTES=1234, OUTPUT_RECORDS=123,
OUTPUT_BYTES=123, OUTPUT_BYTES_WITH_OVERHEAD=123, OUTPUT_BYTES_PHYSICAL=12,
ADDITIONAL_SPILLS_BYTES_WRITTEN=0, ADDITIONAL_SPILLS_BYTES_READ=0,
ADDITIONAL_SPILL_COUNT=0,
SHUFFLE_CHUNK_COUNT=1][example.MapredColorCount$ColorCounter INPUT_RECORDS=100]]
{code}
{code}
[INFO] [TezChild] |runtime.LogicalIOProcessorRuntimeTask|: Final Counters for
attempt_123_0001_1_01_000000_0: Counters: 44 [[File System Counters
FILE_BYTES_READ=123, FILE_BYTES_WRITTEN=0, FILE_READ_OPS=0,
FILE_LARGE_READ_OPS=0, FILE_WRITE_OPS=0, HDFS_BYTES_READ=0,
HDFS_BYTES_WRITTEN=123, HDFS_READ_OPS=4, HDFS_LARGE_READ_OPS=0,
HDFS_WRITE_OPS=2][org.apache.tez.common.counters.TaskCounter
REDUCE_INPUT_GROUPS=12, REDUCE_INPUT_RECORDS=1234, COMBINE_INPUT_RECORDS=0,
SPILLED_RECORDS=123, NUM_SHUFFLED_INPUTS=5, NUM_SKIPPED_INPUTS=0,
NUM_FAILED_SHUFFLE_INPUTS=0, MERGED_MAP_OUTPUTS=5, GC_TIME_MILLIS=12,
CPU_MILLISECONDS=1234, PHYSICAL_MEMORY_BYTES=12345678,
VIRTUAL_MEMORY_BYTES=12345678, COMMITTED_HEAP_BYTES=12345678, OUTPUT_RECORDS=7,
ADDITIONAL_SPILLS_BYTES_WRITTEN=0, ADDITIONAL_SPILLS_BYTES_READ=123,
SHUFFLE_BYTES=123, SHUFFLE_BYTES_DECOMPRESSED=1234, SHUFFLE_BYTES_TO_MEM=0,
SHUFFLE_BYTES_TO_DISK=0, SHUFFLE_BYTES_DISK_DIRECT=123,
NUM_MEM_TO_DISK_MERGES=0, NUM_DISK_TO_DISK_MERGES=0, SHUFFLE_PHASE_TIME=12,
MERGE_PHASE_TIME=123, FIRST_EVENT_RECEIVED=12, LAST_EVENT_RECEIVED=12][Shuffle
Errors BAD_ID=0, CONNECTION=0, IO_ERROR=0, WRONG_LENGTH=0, WRONG_MAP=0,
WRONG_REDUCE=0][example.MapredColorCount$ColorCounter OUTPUT_RECORDS=7]]
{code}
The issue is when we try to retrieve those counters, the Tez {{YarnRunner}}
returns an empty counter object :
{code}
public Counters getJobCounters(JobID jobId)
throws IOException, InterruptedException {
// FIXME needs counters support from DAG
// with a translation layer on client side
Counters empty = new Counters();
return empty;
}
{code}
Hence when we try to get the custom counter it is init-ed to zero and treated
like a new counter as per:
AbstractCounterGroup :
{code}
private synchronized T findCounterImpl(String counterName, boolean create) {
T counter = counters.get(counterName);
if (counter == null && create) {
String localized =
ResourceBundles.getCounterName(getName(), counterName, counterName);
return addCounterImpl(counterName, localized, 0);
}
return counter;
}
{code}
This is true even for framework counters as the counters map above is empty in
tez case.
Asking [~hitesh] if this triaging makes sense and comments on a possible fix.
Mapred YarnRunner equivalent uses getCountersProto to get the job counters.
was (Author: kshukla):
I ran the jar and think I know what the issue is.
The framework and fs and other counters do show up in the task level and dag
level counters as show below- values have been overwritten. (when in yarn-tez
mode)
{code}
[INFO] [TezChild] |runtime.LogicalIOProcessorRuntimeTask|: Final Counters for
attempt_123_0001_1_00_000000_0: Counters: 28 [[File System Counters
FILE_BYTES_READ=123, FILE_BYTES_WRITTEN=123, FILE_READ_OPS=0,
FILE_LARGE_READ_OPS=0, FILE_WRITE_OPS=0, HDFS_BYTES_READ=1234,
HDFS_BYTES_WRITTEN=0, HDFS_READ_OPS=2, HDFS_LARGE_READ_OPS=0,
HDFS_WRITE_OPS=0][org.apache.tez.common.counters.TaskCounter
SPLIT_RAW_BYTES=123, SPILLED_RECORDS=123, GC_TIME_MILLIS=123,
CPU_MILLISECONDS=123, PHYSICAL_MEMORY_BYTES=123456,
VIRTUAL_MEMORY_BYTES=123456, COMMITTED_HEAP_BYTES=12345678,
INPUT_RECORDS_PROCESSED=0, INPUT_SPLIT_LENGTH_BYTES=1234, OUTPUT_RECORDS=123,
OUTPUT_BYTES=123, OUTPUT_BYTES_WITH_OVERHEAD=123, OUTPUT_BYTES_PHYSICAL=12,
ADDITIONAL_SPILLS_BYTES_WRITTEN=0, ADDITIONAL_SPILLS_BYTES_READ=0,
ADDITIONAL_SPILL_COUNT=0,
SHUFFLE_CHUNK_COUNT=1][example.MapredColorCount$ColorCounter INPUT_RECORDS=100]]
{code}
{code}
[INFO] [TezChild] |runtime.LogicalIOProcessorRuntimeTask|: Final Counters for
attempt_123_0001_1_01_000000_0: Counters: 44 [[File System Counters
FILE_BYTES_READ=123, FILE_BYTES_WRITTEN=0, FILE_READ_OPS=0,
FILE_LARGE_READ_OPS=0, FILE_WRITE_OPS=0, HDFS_BYTES_READ=0,
HDFS_BYTES_WRITTEN=123, HDFS_READ_OPS=4, HDFS_LARGE_READ_OPS=0,
HDFS_WRITE_OPS=2][org.apache.tez.common.counters.TaskCounter
REDUCE_INPUT_GROUPS=12, REDUCE_INPUT_RECORDS=1234, COMBINE_INPUT_RECORDS=0,
SPILLED_RECORDS=123, NUM_SHUFFLED_INPUTS=5, NUM_SKIPPED_INPUTS=0,
NUM_FAILED_SHUFFLE_INPUTS=0, MERGED_MAP_OUTPUTS=5, GC_TIME_MILLIS=12,
CPU_MILLISECONDS=1234, PHYSICAL_MEMORY_BYTES=12345678,
VIRTUAL_MEMORY_BYTES=12345678, COMMITTED_HEAP_BYTES=12345678, OUTPUT_RECORDS=7,
ADDITIONAL_SPILLS_BYTES_WRITTEN=0, ADDITIONAL_SPILLS_BYTES_READ=123,
SHUFFLE_BYTES=123, SHUFFLE_BYTES_DECOMPRESSED=1234, SHUFFLE_BYTES_TO_MEM=0,
SHUFFLE_BYTES_TO_DISK=0, SHUFFLE_BYTES_DISK_DIRECT=123,
NUM_MEM_TO_DISK_MERGES=0, NUM_DISK_TO_DISK_MERGES=0, SHUFFLE_PHASE_TIME=12,
MERGE_PHASE_TIME=123, FIRST_EVENT_RECEIVED=12, LAST_EVENT_RECEIVED=12][Shuffle
Errors BAD_ID=0, CONNECTION=0, IO_ERROR=0, WRONG_LENGTH=0, WRONG_MAP=0,
WRONG_REDUCE=0][example.MapredColorCount$ColorCounter OUTPUT_RECORDS=7]]
{code}
The issue is when we try to retrieve those counters, the Tez {{YarnRunner}}
returns an empty counter object :
{code}
public Counters getJobCounters(JobID jobId)
throws IOException, InterruptedException {
// FIXME needs counters support from DAG
// with a translation layer on client side
Counters empty = new Counters();
return empty;
}
{code}
Hence when we try to get the custom counter it is init-ed to zero and treated
like a new counter as per:
AbstractCounterGroup :
{code}
private synchronized T findCounterImpl(String counterName, boolean create) {
T counter = counters.get(counterName);
if (counter == null && create) {
String localized =
ResourceBundles.getCounterName(getName(), counterName, counterName);
return addCounterImpl(counterName, localized, 0);
}
return counter;
}
{code}
This is true even for framework counters as the counters map above is empty in
tez case.
Asking [~hitesh] if this triaging makes sense and comments on a possible fix.
Mapred YarnRunner equivalent uses getCountersProto to get the job counters.
> Issues running M/R jobs with Tez
> --------------------------------
>
> Key: TEZ-3459
> URL: https://issues.apache.org/jira/browse/TEZ-3459
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Manuel Godbert
> Attachments: colorCount.sh, mr-example.jar
>
>
> After applying the patch delivered in TEZ-3330, I enriched the
> MapredColorCount example to reproduce some of the other issues I encountered
> on the jobs I wish to see running with Tez.
> I am attaching a jar to the JIRA, including source code, and a script file
> detailing the observed results in comments.
> It adresses 3 issues:
> - the embedded jars in /lib are ignored by Tez, but YARN uses them without
> additional configuration
> - The use of a combiner causes a NullPointerException
> - The counters incremented in the Reporter objects stay at 0
> I am using HDP2.4
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)