[ https://issues.apache.org/jira/browse/DRILL-8280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17580817#comment-17580817 ]
ASF GitHub Bot commented on DRILL-8280: --------------------------------------- jnturton opened a new pull request, #2625: URL: https://github.com/apache/drill/pull/2625 # [DRILL-8280](https://issues.apache.org/jira/browse/DRILL-8280): Cannot ANALYZE files containing non-ASCII column names ## Description The merge_schema function in SchemaFunctions is modified to use UTF-8 string parsing so that a column with a name like "Käse" will no longer crash ANALYZE TABLE REFRESH METADATA. ## Documentation N/A ## Testing TestMetastoreCommands#testNonAsciiColumnName > Cannot ANALYZE files containing non-ASCII column names > ------------------------------------------------------- > > Key: DRILL-8280 > URL: https://issues.apache.org/jira/browse/DRILL-8280 > Project: Apache Drill > Issue Type: Bug > Components: Metadata > Affects Versions: 1.20.2 > Reporter: James Turton > Assignee: James Turton > Priority: Minor > Fix For: 1.20.3 > > Attachments: 0_0_0.parquet > > > The attached Parquet file contains a single column named "Käse". If it is > saved under /tmp/utf8_col and then the Drill command > {code:java} > analyze table dfs.tmp.utf8_col columns none refresh metadata;{code} > is run then the following error is raised during the execution of the > merge_schema function. > {code:java} > com.fasterxml.jackson.databind.JsonMappingException: Unrecognized character > escape 'x' (code 120) > at [Source: > (String)"{"type":"tuple_schema","columns":[{"name":"K\xC3\xA4se","type":"VARCHAR","mode":"REQUIRED"}]}"; > line: 1, column: 47]{code} -- This message was sent by Atlassian Jira (v8.20.10#820010)