Sikan Chen created ARROW-11549:
----------------------------------
Summary: Expression::ToString() doesn't distinguish null and
string literal 'null', causing issues with FilterCacheKey
Key: ARROW-11549
URL: https://issues.apache.org/jira/browse/ARROW-11549
Project: Apache Arrow
Issue Type: Bug
Components: C++, C++ - Gandiva
Reporter: Sikan Chen
Gandiva's caching mechanism for filters relies on {{FilterCacheKey}} to return
the correct cached filter. {{FilterCacheKey}}'s hash function factors in the
string representation of a given expression, however {{Expression::ToString()}}
doesn't really distinguish null and string literal 'null'. As a result,
incorrect filters may be returned from the cache.
In our case, we are building a SQL parser on top of gandiva, but, for instance,
both of
{code:java}
WHERE null = null
{code}
and
{code:java}
WHERE 'null' = 'null'
{code}
result in the same string representation of gandiva expression:
{code:java}
bool equal((const string) null, (const string) null)
{code}
A simple test to demonstrate the issue:
{code:java}
auto f = field("foo", utf8());
auto schema = arrow::schema({f});
auto node_a = TreeExprBuilder::MakeStringLiteral("null");
auto node_b = TreeExprBuilder::MakeStringLiteral("null");
auto equal_func =
TreeExprBuilder::MakeFunction("equal", {node_a, node_b},
arrow::boolean());
auto condition = TreeExprBuilder::MakeCondition(equal_func);
std::shared_ptr<Filter> filter1;
auto status = Filter::Make(schema, condition, &filter1);
EXPECT_TRUE(status.ok());
auto string_type = std::make_shared<arrow::StringType>();
node_a = TreeExprBuilder::MakeNull(string_type);
node_b = TreeExprBuilder::MakeNull(string_type);
equal_func =
TreeExprBuilder::MakeFunction("equal", {node_a, node_b},
arrow::boolean());
condition = TreeExprBuilder::MakeCondition(equal_func);
std::shared_ptr<Filter> filter2;
status = Filter::Make(schema, condition, &filter2);
EXPECT_TRUE(status.ok());
EXPECT_TRUE(filter1.get() != filter2.get());
{code}
Making {{LiteralToStream}} adding quotes around the literal seems like a
quick-and-dirty fix.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)