Ye Ding created CALCITE-2163:
--------------------------------

             Summary: Using "UTF16" as default charset failed
                 Key: CALCITE-2163
                 URL: https://issues.apache.org/jira/browse/CALCITE-2163
             Project: Calcite
          Issue Type: Bug
            Reporter: Ye Ding
            Assignee: Julian Hyde


I have a project that need to handle non-ASCII character, so I have set default 
charset to "UTF16" by setting "saffron.default.charset" to "UTF16", but failed 
with below error stack

{code:txt}
Caused by: java.nio.charset.UnsupportedCharsetException: UTF-16
        at org.apache.calcite.util.NlsString.<init>(NlsString.java:72)
        at org.apache.calcite.rex.RexBuilder.makeLiteral(RexBuilder.java:882)
        at org.apache.calcite.rex.RexBuilder.<init>(RexBuilder.java:117)
        at 
org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:1046)
        at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:154)
        ... 29 more
{code}

Having explored related source code I found a suspicious code that may cause 
the problem.

Here is a code block from RexBuilder, between L869 and L883.

{code:java}
case CHAR:
  // Character literals must have a charset and collation. Populate
  // from the type if necessary.
  assert o instanceof NlsString;
  NlsString nlsString = (NlsString) o;
  if ((nlsString.getCollation() == null)
      || (nlsString.getCharset() == null)) {
    assert type.getSqlTypeName() == SqlTypeName.CHAR;
    assert type.getCharset().name() != null;
    assert type.getCollation() != null;
    o = new NlsString(
        nlsString.getValue(),
        type.getCharset().name(),
        type.getCollation());
  }
{code}

At the last line, a *Java* charset name is used to construct NlsString.

But from the code of NlsString's constructor, the charsetName is supposed to be 
*SQL* charset name.

{code:java}
  public NlsString(
      String value,
      String charsetName,
      SqlCollation collation) {
    assert value != null;
    if (null != charsetName) {
      charsetName = charsetName.toUpperCase(Locale.ROOT);
      this.charsetName = charsetName;
      String javaCharsetName =
          SqlUtil.translateCharacterSetName(charsetName);
      if (javaCharsetName == null) {
        throw new UnsupportedCharsetException(charsetName);
      }
      this.charset = Charset.forName(javaCharsetName);
      CharsetEncoder encoder = charset.newEncoder();
      ....
{code}

I have not read and fully understood codes, so I'm not sure if it's the root 
cause of the problem. Currently I've managed to work around it by setting 
"saffron.default.charset" to "UTF-16LE".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to