[
https://issues.apache.org/jira/browse/HIVE-2780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13558440#comment-13558440
]
Phabricator commented on HIVE-2780:
-----------------------------------
navis has commented on the revision "HIVE-2780 [jira] Implement more
restrictive table sampler".
INLINE COMMENTS
ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java:489 ok.
ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java:583 ok.
ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java:657 I
remember the code is copied from CombineHiveInputFormat. I'll check that.
ql/src/java/org/apache/hadoop/hive/ql/io/SplitSampler.java:34 ok.
ql/src/test/results/clientpositive/split_sample_sampler.q.out:27 Original
implementation provided split level granularity and the purpose of this patch
is making it smaller (per row). This means underlying files should be
splittable, which you pointed out previously.
REVISION DETAIL
https://reviews.facebook.net/D1623
BRANCH
DPAL-722
To: JIRA, ashutoshc, navis
> Implement more restrictive table sampler
> ----------------------------------------
>
> Key: HIVE-2780
> URL: https://issues.apache.org/jira/browse/HIVE-2780
> Project: Hive
> Issue Type: Improvement
> Reporter: Navis
> Assignee: Navis
> Priority: Trivial
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2780.D1623.1.patch,
> ASF.LICENSE.NOT.GRANTED--HIVE-2780.D1623.2.patch, HIVE-2780.D1623.3.patch
>
>
> Current table sampling scans whole block, making more rows included than
> expected especially for small tables.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira