James Turton created DRILL-8509:
-----------------------------------
Summary: Pass Unicode string values through the JDBC writer
without escape sequences
Key: DRILL-8509
URL: https://issues.apache.org/jira/browse/DRILL-8509
Project: Apache Drill
Issue Type: Bug
Components: Storage - JDBC
Affects Versions: 1.21.2
Reporter: James Turton
Fix For: Future
When characters outside of the ASCII printable range appear inside string
values passed to the JDBC writer via a CTAS with a JDBC storage plugin as its
destination, the JDBC writer replaces them with escape sequences embedded in
PostgreSQL-style Unicode strings prefixed with 'u&'. An example in which a tab
character is replaced with \0009 is [visible
here|https://github.com/apache/drill/issues/2922].
# Review character encoding and escaping JdbcRecordWriter.java and
InsertStatementBuilder.java.
# Review the SqlDialect selection made by the JdbcWriter, looking for why a
PostgreSQL dialect [appears to have been selected for a JDBC connection to
MariaDB|https://github.com/apache/drill/issues/2922].
# Determine whether a MySQL / MariaDB SQL dialect can be selected instead, and
whether this will resolve the issue.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)