Oleksiy Sayankin created HIVE-22980:
---------------------------------------
Summary: Support custom path filter for ORC tables
Key: HIVE-22980
URL: https://issues.apache.org/jira/browse/HIVE-22980
Project: Hive
Issue Type: New Feature
Components: ORC
Reporter: Oleksiy Sayankin
Assignee: Oleksiy Sayankin
The customer is looking for an option to specify custom path filter for ORC
tables. Please find the details below from customer requirement.
Problem Statement/Approach in customer words :
{quote}
Currently, Orc file input format does not take in path filters set in the
property "mapreduce.input.pathfilter.class" OR " mapred.input.pathfilter.class
". So, we cannot use custom filters with Orc files.
AcidUtils class has a static filter called "hiddenFilters" which is used by ORC
to filter input paths. If we can pass the custom filter classes(set in the
property mentioned above) to AcidUtils and replace hiddenFilter with a filter
that does an "and" operation over hiddenFilter+customFilters, the filters would
work well.
On local testing, mapreduce.input.pathfilter.class seems to be working for Text
tables but not for ORC tables.
{quote}
Our analysis:
{{OrcInputFormat}} and {{FileInputFormat}} are different implementations for
{{Inputformat}} interface. Property "{{mapreduce.input.pathfilter.class}}" is
only respected by {{FileInputFormat}}, but not by any other implementations of
{{InputFormat}}. The customer wants to have the ability to filter out rows
based on path/filenames, current ORC features like bloomfilters and indexes are
not good enough for them to minimize number of disk read operations.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)