[
https://issues.apache.org/jira/browse/HIVE-5632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13805459#comment-13805459
]
Eric Hanson commented on HIVE-5632:
-----------------------------------
Have you considered adding min/max metadata at the split (as opposed to stripe)
level? If there are, say, 1 million rows per split, you could check to see if
you could skip a split on 1/100th of the time it takes to check 100 stripes
within the split that are 10,000 rows each.
Having hierarchical min/max metadata may be a good idea, both at the split and
stripe level.
> Eliminate splits based on SARGs using stripe statistics in ORC
> --------------------------------------------------------------
>
> Key: HIVE-5632
> URL: https://issues.apache.org/jira/browse/HIVE-5632
> Project: Hive
> Issue Type: Improvement
> Affects Versions: 0.13.0
> Reporter: Prasanth J
> Assignee: Prasanth J
> Labels: orcfile
> Attachments: HIVE-5632.1.patch.txt, HIVE-5632.2.patch.txt,
> orc_split_elim.orc
>
>
> HIVE-5562 provides stripe level statistics in ORC. Stripe level statistics
> combined with predicate pushdown in ORC (HIVE-4246) can be used to eliminate
> the stripes (thereby splits) that doesn't satisfy the predicate condition.
> This can greatly reduce unnecessary reads.
--
This message was sent by Atlassian JIRA
(v6.1#6144)