Weiyang Zhao created ARROW-10574:
------------------------------------
Summary: [Python][Parquet] Enhance hive partition filtering
Key: ARROW-10574
URL: https://issues.apache.org/jira/browse/ARROW-10574
Project: Apache Arrow
Issue Type: Improvement
Components: Python
Reporter: Weiyang Zhao
I would like to enhance partition filters in methods such as:
{{pyarrow.parquet.ParquetDataset(path, filters)}}
I am proposing two enhancements:
# for operator "in", "not in", the value currently must be a set. My
experience is that if I passed in a list, it will simply not result any values
without good warning. I would like to change it to accept any Iterable, which
includes set, list, tuple and etc. but not strings. Internally I will construct
a set from the Iterable to avoid duplicate elements.
# I would like to add a 'like' operator which has a semantics of a sql like.
Alternatively, a regular expression can be used. I prefer sql like semantics
for reasons to achieve sql consistency.
I have already made the changes and test cases locally. Once this is approved,
I can submit it.
Thank you.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)