[
https://issues.apache.org/jira/browse/ARROW-11762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17387396#comment-17387396
]
Ben Kietzman commented on ARROW-11762:
--------------------------------------
You're correct for those filter expressions, but I was referring to the
guarantees produced by partitions. Specifically, currently it's legal for a
HivePartitioning to parse either of {{/a=0/}} or
{{/a=0/b=__HIVE_DEFAULT_PARTITION__/}} as {{a == 0}} or as {{a == 0 and
is_null(b)}}. The former guarantee doesn't include explicit information about
field {{b}}, which we currently consider to be equivalent to specifying that
it's null. This is not optimal; we'd prefer to be specific
> [C++][Dataset] Refactor Partitioning to explicitly treat null and absent
> fields identically
> -------------------------------------------------------------------------------------------
>
> Key: ARROW-11762
> URL: https://issues.apache.org/jira/browse/ARROW-11762
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++
> Affects Versions: 3.0.0
> Reporter: Ben Kietzman
> Assignee: Weston Pace
> Priority: Major
> Fix For: 6.0.0
>
>
> ARROW-10438 adds support for partition expressions with explicit absence of a
> partition key by including an {{is_null(field_ref("absent key field name"))}}
> in the conjunction. Whenever possible, this should be preferred to an
> equivalent conjunction which simply omits an equality expression for the
> missing field.
> Additionally since an absent partition key and a null partition key is
> semantically equivalent to a null valued partition key, we should ensure
> there is no difference in behavior. Currently, {{equal(field_ref("a"),
> literal(0))}} and {{and_(equal(field_ref("a"), literal(0)), is_null("b"))}}
> are formatted differently
--
This message was sent by Atlassian Jira
(v8.3.4#803005)