[
https://issues.apache.org/jira/browse/IMPALA-7784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043097#comment-17043097
]
Quanlong Huang commented on IMPALA-7784:
----------------------------------------
Also find another bug that unescaped string values are unescaped again (set
needsUnescaping_=true) in coordinators when deserializing them from Thrift
objects. For example, creating a partition with value = "\\\"", the coordinator
finally gets the value as "\"":
{code:sql}
hive> create table tpart (i int) partitioned by (p string);
hive> insert into tpart partition (p="\"") values (1);
hive> insert into tpart partition (p='\'') values (2);
hive> insert into tpart partition (p="\\\"") values (3);
hive> insert into tpart partition (p='\\\'') values (4);
hive> select * from tpart;
+----------+----------+
| tpart.i | tpart.p |
+----------+----------+
| 1 | " |
| 2 | ' |
| 3 | \" |
| 4 | \" |
+----------+----------+
impala> invalidate metadata tpart;
impala> show partitions tpart;
+-------+-------+--------+------+--------------+-------------------+--------+-------------------+------------------------------------------------------+
| p | #Rows | #Files | Size | Bytes Cached | Cache Replication | Format |
Incremental stats | Location |
+-------+-------+--------+------+--------------+-------------------+--------+-------------------+------------------------------------------------------+
| " | -1 | 1 | 2B | NOT CACHED | NOT CACHED | TEXT |
false | hdfs://localhost:20500/test-warehouse/tpart/p=%22 |
| " | -1 | 1 | 2B | NOT CACHED | NOT CACHED | TEXT |
false | hdfs://localhost:20500/test-warehouse/tpart/p=%5C%22 |
| ' | -1 | 1 | 2B | NOT CACHED | NOT CACHED | TEXT |
false | hdfs://localhost:20500/test-warehouse/tpart/p=%27 |
| ' | -1 | 1 | 2B | NOT CACHED | NOT CACHED | TEXT |
false | hdfs://localhost:20500/test-warehouse/tpart/p=%5C%27 |
| Total | -1 | 4 | 8B | 0B | | |
| |
+-------+-------+--------+------+--------------+-------------------+--------+-------------------+------------------------------------------------------+
impala> select * from tpart;
+---+---+
| i | p |
+---+---+
| 3 | " |
| 2 | ' |
| 1 | " |
| 4 | " |
+---+---+
{code}
The cause is that
[LiteralExpr#fromThrift()|https://github.com/apache/impala/blob/2c54dbe22507661664b39cb76849f794cf4743d6/fe/src/main/java/org/apache/impala/analysis/LiteralExpr.java#L147]
calls
[LiteralExpr#create()|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/LiteralExpr.java#L90]
which always marks StringLiteral's needsUnescaping_ as true. So the unescaped
string values will be unescaped again when used in coordinators.
I'm working on a patch to fix these together.
> Partition pruning handles escaped strings incorrectly
> -----------------------------------------------------
>
> Key: IMPALA-7784
> URL: https://issues.apache.org/jira/browse/IMPALA-7784
> Project: IMPALA
> Issue Type: Bug
> Components: Frontend
> Affects Versions: Impala 3.0
> Reporter: Csaba Ringhofer
> Assignee: Quanlong Huang
> Priority: Critical
> Labels: correctness
>
> Repro:
> {code}
> create table tpart (i int) partitioned by (p string)
> insert into tpart partition (p="\"") values (1);
> select * from tpart where p = "\"";
> Result;
> Fetched 0 row(s)
> select * from tpart where p = '"';
> Result:
> 1,""""
> {code}
> Hive returns the row for both queries.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]