[
https://issues.apache.org/jira/browse/ARROW-11549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17367112#comment-17367112
]
Kirill Lykov commented on ARROW-11549:
--------------------------------------
Looks like PR is ready to be merged, but stale for unknown reason. I hope that
my comment will find the author of PR
> [C++][Gandiva] Expression::ToString() doesn't distinguish null and string
> literal 'null', causing issues with FilterCacheKey
> ----------------------------------------------------------------------------------------------------------------------------
>
> Key: ARROW-11549
> URL: https://issues.apache.org/jira/browse/ARROW-11549
> Project: Apache Arrow
> Issue Type: Bug
> Components: C++ - Gandiva
> Reporter: Sikan Chen
> Priority: Critical
> Labels: pull-request-available
> Time Spent: 20m
> Remaining Estimate: 0h
>
> Gandiva's caching mechanism for filters relies on {{FilterCacheKey}} to
> return the correct cached filter. {{FilterCacheKey}}'s hash function factors
> in the string representation of a given expression, however
> {{Expression::ToString()}} doesn't really distinguish null and string literal
> 'null'. As a result, incorrect filters may be returned from the cache.
> In our case, we are building a SQL parser on top of gandiva, but, for
> instance, both of
> {code:java}
> WHERE null = null
> {code}
> and
> {code:java}
> WHERE 'null' = 'null'
> {code}
> result in the same string representation of gandiva expression:
> {code:java}
> bool equal((const string) null, (const string) null)
> {code}
> A simple test to demonstrate the issue:
> {code:java}
> auto f = field("foo", utf8());
> auto schema = arrow::schema({f});
> auto node_a = TreeExprBuilder::MakeStringLiteral("null");
> auto node_b = TreeExprBuilder::MakeStringLiteral("null");
> auto equal_func =
> TreeExprBuilder::MakeFunction("equal", {node_a, node_b},
> arrow::boolean());
> auto condition = TreeExprBuilder::MakeCondition(equal_func);
> std::shared_ptr<Filter> filter1;
> auto status = Filter::Make(schema, condition, &filter1);
> EXPECT_TRUE(status.ok());
> auto string_type = std::make_shared<arrow::StringType>();
> node_a = TreeExprBuilder::MakeNull(string_type);
> node_b = TreeExprBuilder::MakeNull(string_type);
> equal_func =
> TreeExprBuilder::MakeFunction("equal", {node_a, node_b},
> arrow::boolean());
> condition = TreeExprBuilder::MakeCondition(equal_func);
> std::shared_ptr<Filter> filter2;
> status = Filter::Make(schema, condition, &filter2);
> EXPECT_TRUE(status.ok());
> EXPECT_TRUE(filter1.get() != filter2.get()); // fails here
> {code}
>
> Making {{LiteralToStream}} add quotes around the literal seems like a
> quick-and-dirty fix.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)