[ 
https://issues.apache.org/jira/browse/IMPALA-14237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18019011#comment-18019011
 ] 

ASF subversion and git services commented on IMPALA-14237:
----------------------------------------------------------

Commit 321429eac6400fd8c1b22c8aabb2a11fee381437 in impala's branch 
refs/heads/master from Daniel Vanko
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=321429eac ]

IMPALA-14237: Fix Iceberg partition values encoding

This patch modifies the string overload of
IcebergFunctions::TruncatePartitionTransform so that it always handles
strings as UTF-8-encoded ones, because the Iceberg specification states
that that strings are UTF-8 encoded.

Also, for an Iceberg table UrlEncode is called in not the
Hive-compatible way, rather than the standard way, similar to Java's
URLEncoder.encode() (which the Iceberg API also uses) to conform with
existing practices by Hive, Spark and Trino. This included a change in
the set of characters which are not escaped to follow the URL Standard's
application/x-www-form-urlencoded format. [1] Also renamed it from
ShouldNotEscape to IsUrlSafe for better readability.

Testing:
 * add and extend e2e tests to check partitions with Unicode characters
 * add be tests to coding-util-test.cc

[1]: 
https://url.spec.whatwg.org/#application-x-www-form-urlencoded-percent-encode-set

Change-Id: Iabb39727f6dd49b76c918bcd6b3ec62532555755
Reviewed-on: http://gerrit.cloudera.org:8080/23190
Reviewed-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>


> iceberg_truncate_transform should work well on UTF8 encoded STRINGs
> -------------------------------------------------------------------
>
>                 Key: IMPALA-14237
>                 URL: https://issues.apache.org/jira/browse/IMPALA-14237
>             Project: IMPALA
>          Issue Type: Sub-task
>            Reporter: Zoltán Borók-Nagy
>            Assignee: Dániel Gábor Vankó
>            Priority: Blocker
>              Labels: impala-iceberg
>
> The Iceberg spec states that STRINGs are UTF-8 encoded.
> Which means iceberg_truncate_transform should work well on UTF8-strings, no 
> matter what is the value of the query option UTF8_MODE.
> Iceberg transforms are declared in be/src/exprs/iceberg-functions.h, defined 
> in be/src/exprs/iceberg-functions.cc



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to