Oleksiy Sayankin created HIVE-22980: ---------------------------------------
Summary: Support custom path filter for ORC tables Key: HIVE-22980 URL: https://issues.apache.org/jira/browse/HIVE-22980 Project: Hive Issue Type: New Feature Components: ORC Reporter: Oleksiy Sayankin Assignee: Oleksiy Sayankin The customer is looking for an option to specify custom path filter for ORC tables. Please find the details below from customer requirement. Problem Statement/Approach in customer words : {quote} Currently, Orc file input format does not take in path filters set in the property "mapreduce.input.pathfilter.class" OR " mapred.input.pathfilter.class ". So, we cannot use custom filters with Orc files. AcidUtils class has a static filter called "hiddenFilters" which is used by ORC to filter input paths. If we can pass the custom filter classes(set in the property mentioned above) to AcidUtils and replace hiddenFilter with a filter that does an "and" operation over hiddenFilter+customFilters, the filters would work well. On local testing, mapreduce.input.pathfilter.class seems to be working for Text tables but not for ORC tables. {quote} Our analysis: {{OrcInputFormat}} and {{FileInputFormat}} are different implementations for {{Inputformat}} interface. Property "{{mapreduce.input.pathfilter.class}}" is only respected by {{FileInputFormat}}, but not by any other implementations of {{InputFormat}}. The customer wants to have the ability to filter out rows based on path/filenames, current ORC features like bloomfilters and indexes are not good enough for them to minimize number of disk read operations. -- This message was sent by Atlassian Jira (v8.3.4#803005)