[ 
https://issues.apache.org/jira/browse/DRILL-8280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17580817#comment-17580817
 ] 

ASF GitHub Bot commented on DRILL-8280:
---------------------------------------

jnturton opened a new pull request, #2625:
URL: https://github.com/apache/drill/pull/2625

   # [DRILL-8280](https://issues.apache.org/jira/browse/DRILL-8280): Cannot 
ANALYZE files containing non-ASCII column names
   
   ## Description
   
   The merge_schema function in SchemaFunctions is modified to use UTF-8 string 
parsing so that a column with a name like "Käse" will no longer crash ANALYZE 
TABLE REFRESH METADATA.
   
   ## Documentation
   N/A
   
   ## Testing
   TestMetastoreCommands#testNonAsciiColumnName
   




> Cannot ANALYZE files containing non-ASCII column names 
> -------------------------------------------------------
>
>                 Key: DRILL-8280
>                 URL: https://issues.apache.org/jira/browse/DRILL-8280
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Metadata
>    Affects Versions: 1.20.2
>            Reporter: James Turton
>            Assignee: James Turton
>            Priority: Minor
>             Fix For: 1.20.3
>
>         Attachments: 0_0_0.parquet
>
>
> The attached Parquet file contains a single column named "Käse". If it is 
> saved under /tmp/utf8_col and then the Drill command
> {code:java}
> analyze table dfs.tmp.utf8_col columns none refresh metadata;{code}
> is run then the following error is raised during the execution of the 
> merge_schema function.
> {code:java}
> com.fasterxml.jackson.databind.JsonMappingException: Unrecognized character 
> escape 'x' (code 120)
>  at [Source: 
> (String)"{"type":"tuple_schema","columns":[{"name":"K\xC3\xA4se","type":"VARCHAR","mode":"REQUIRED"}]}";
>  line: 1, column: 47]{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to