[
https://issues.apache.org/jira/browse/HIVE-16341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15957384#comment-15957384
]
Gopal V commented on HIVE-16341:
--------------------------------
Yup, LGTM - +1
> Tez Task Execution Summary has incorrect input record counts on some operators
> ------------------------------------------------------------------------------
>
> Key: HIVE-16341
> URL: https://issues.apache.org/jira/browse/HIVE-16341
> Project: Hive
> Issue Type: Bug
> Components: Tez
> Reporter: Jason Dere
> Assignee: Jason Dere
> Attachments: HIVE-16341.1.patch, HIVE-16341.2.patch
>
>
> {noformat}
> Task Execution Summary
> --------------------------------------------------------------------------------------------------------------------------------
> VERTICES TOTAL_TASKS FAILED_ATTEMPTS KILLED_TASKS DURATION(ms)
> CPU_TIME(ms) GC_TIME(ms) INPUT_RECORDS OUTPUT_RECORDS
> --------------------------------------------------------------------------------------------------------------------------------
> Map 1 167 0 0 17640.00
> 2,109,200 23,068 150,000,004 11,995,136
> Map 11 5 0 0 10559.00
> 71,960 633 4,023,690 799,900
> Map 13 1 0 0 2244.00
> 6,090 29 25 3
> Map 3 1 0 0 2849.00
> 7,080 99 25 3
> Map 5 271 0 0 55834.00
> 12,934,890 358,376 1,500,000,001 1,500,000,161
> Map 7 241 0 0 91243.00
> 5,020,860 71,182 1,827,250,341 652,413,443
> Reducer 10 1 0 0 1010.00
> 1,900 0 4 0
> Reducer 12 1 0 0 3854.00
> 1,320 0 799,900 1
> Reducer 14 1 0 0 1420.00
> 3,790 45 3 1
> Reducer 2 1 0 0 9720.00
> 6,220 122 11,995,136 1
> Reducer 4 1 0 0 810.00
> 2,100 105 3 1
> Reducer 6 1 0 0 24863.00
> 3,260 5 1,500,000,161 1
> Reducer 8 412 0 0 88215.00
> 17,106,440 184,524 2,165,208,640 1,864
> Reducer 9 2 0 0 29752.00
> 3,980 0 1,864 4
> --------------------------------------------------------------------------------------------------------------------
> {noformat}
> Seeing this on queries using runtime filtering. Noticed the INPUT_RECORDS
> look incorrect for the reducers that are responsible for aggregating the
> min/max/bloomfilter (Reducers 12, 14, 2, 6). For example Reducer 2 shows 12M
> input records. However looking at the task logs for Reducer 2, there were
> only 167 input records.
> It looks like Map 1 has 2 different output vertices (Reducer 2 and Reducer
> 8), but the total output rows for Map 1 (rather than just the rows going to
> each specific vertex) is being counted in the input rows for both Reducer 2
> and Reducer 8.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)