[ 
https://issues.apache.org/jira/browse/HIVE-26639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor resolved HIVE-26639.
---------------------------------
    Resolution: Fixed

> ConstantVectorExpression and ExplainTask shouldn't rely on default charset
> --------------------------------------------------------------------------
>
>                 Key: HIVE-26639
>                 URL: https://issues.apache.org/jira/browse/HIVE-26639
>             Project: Hive
>          Issue Type: Bug
>            Reporter: László Bodor
>            Assignee: László Bodor
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.0.0-alpha-2
>
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> In HS2 (and other components) we rely on UTF8 encoding, hence while storing 
> strings as bytes, we store the UTF8-encoded bytes. Some java APIs rely on 
> default system encoding in different ways, which can lead to incorrect 
> encoding (if system settings defaults other than UTF8). This patch intends to 
> fix 2 different paths:
> 1. ConstantVectorExpression
> in my case, this:
> {code}
> LOG.info("default charset name: " + 
> java.nio.charset.Charset.defaultCharset().name());
> LOG.info("getBytes() = " + ((String) constantValue).getBytes());
> LOG.info("getBytes(StandardCharsets.UTF_8) = " + ((String) 
> constantValue).getBytes(StandardCharsets.UTF_8));
> {code}
> led to:
> {code}
> default charset name: US-ASCII
> getBytes() = [B@73dcffb0
> getBytes(StandardCharsets.UTF_8) = [B@2ead0b9c
> {code}
> on the customer side, queries returned wrong results when the filter 
> contained the special character (which is part of UTF8 character table):
> {code}
> SELECT b FROM default.rlv_test1 where b='北京';
> ....
> ??
> {code}
> 2. Explain
> Similarly, explain printed to a PrintStream of different encoding, leading to 
> a plan like:
> {code}
>                   Map Operator Tree:
>                       TableScan
>                         alias: test_table
>                         filterExpr: (b = '??') (type: boolean)
>                         Statistics: Num rows: 2 Data size: 352 Basic stats: 
> COMPLETE Column stats: COMPLETE
>                         Filter Operator
>                           predicate: (b = '??') (type: boolean)
>                           Statistics: Num rows: 2 Data size: 352 Basic stats: 
> COMPLETE Column stats: COMPLETE
>                           Select Operator
>                             expressions: a (type: int), '??' (type: string), 
> c (type: string)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to