[
https://issues.apache.org/jira/browse/TEZ-1081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13977740#comment-13977740
]
Sergey Shelukhin commented on TEZ-1081:
---------------------------------------
[~sseth] [~t3rmin4t0r] fyi
> expose some basic statistics from org.apache.tez.runtime.api.Input (or
> similar)
> -------------------------------------------------------------------------------
>
> Key: TEZ-1081
> URL: https://issues.apache.org/jira/browse/TEZ-1081
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Sergey Shelukhin
>
> Hive loads data from org.apache.tez.runtime.api.Input into mapjoin
> hashtables. It would be useful to know in advance
> 1) How many rows are there in the input (should be easy to add).
> 2) How many unique keys (even an approximation).
--
This message was sent by Atlassian JIRA
(v6.2#6252)