James Turton created DRILL-8509: ----------------------------------- Summary: Pass Unicode string values through the JDBC writer without escape sequences Key: DRILL-8509 URL: https://issues.apache.org/jira/browse/DRILL-8509 Project: Apache Drill Issue Type: Bug Components: Storage - JDBC Affects Versions: 1.21.2 Reporter: James Turton Fix For: Future
When characters outside of the ASCII printable range appear inside string values passed to the JDBC writer via a CTAS with a JDBC storage plugin as its destination, the JDBC writer replaces them with escape sequences embedded in PostgreSQL-style Unicode strings prefixed with 'u&'. An example in which a tab character is replaced with \0009 is [visible here|https://github.com/apache/drill/issues/2922]. # Review character encoding and escaping JdbcRecordWriter.java and InsertStatementBuilder.java. # Review the SqlDialect selection made by the JdbcWriter, looking for why a PostgreSQL dialect [appears to have been selected for a JDBC connection to MariaDB|https://github.com/apache/drill/issues/2922]. # Determine whether a MySQL / MariaDB SQL dialect can be selected instead, and whether this will resolve the issue. -- This message was sent by Atlassian Jira (v8.20.10#820010)