chaokunyang opened a new issue, #1754:
URL: https://github.com/apache/fury/issues/1754

   ## Is your feature request related to a problem? Please describe.
   Currently Fury use `java.lang.StringCoding#encode(java.nio.charset.Charset, 
char[], int, int)` to convert  utf16 to utf8.
   ```java
     static byte[] encode(Charset cs, char[] ca, int off, int len) {
           CharsetEncoder ce = cs.newEncoder();
           int en = scale(len, ce.maxBytesPerChar());
           byte[] ba = new byte[en];
           if (len == 0)
               return ba;
           boolean isTrusted = false;
           if (System.getSecurityManager() != null) {
               if (!(isTrusted = (cs.getClass().getClassLoader0() == null))) {
                   ca =  Arrays.copyOfRange(ca, off, off + len);
                   off = 0;
               }
           }
           ce.onMalformedInput(CodingErrorAction.REPLACE)
             .onUnmappableCharacter(CodingErrorAction.REPLACE)
             .reset();
           if (ce instanceof ArrayEncoder) {
               int blen = ((ArrayEncoder)ce).encode(ca, off, len, ba);
               return safeTrim(ba, blen, cs, isTrusted);
           } else {
               ByteBuffer bb = ByteBuffer.wrap(ba);
               CharBuffer cb = CharBuffer.wrap(ca, off, len);
               try {
                   CoderResult cr = ce.encode(cb, bb, true);
                   if (!cr.isUnderflow())
                       cr.throwException();
                   cr = ce.flush(bb);
                   if (!cr.isUnderflow())
                       cr.throwException();
               } catch (CharacterCodingException x) {
                   throw new Error(x);
               }
               return safeTrim(ba, bb.position(), cs, isTrusted);
           }
       }
   ```
   
   This invoke `sun.nio.cs.UTF_8.Encoder#encode`:
   ```
           public int encode(char[] sa, int sp, int len, byte[] da) {
               int sl = sp + len;
               int dp = 0;
               int dlASCII = dp + Math.min(len, da.length);
   
               // ASCII only optimized loop
               while (dp < dlASCII && sa[sp] < '\u0080')
                   da[dp++] = (byte) sa[sp++];
   
               while (sp < sl) {
                   char c = sa[sp++];
                   if (c < 0x80) {
                       // Have at most seven bits
                       da[dp++] = (byte)c;
                   } else if (c < 0x800) {
                       // 2 bytes, 11 bits
                       da[dp++] = (byte)(0xc0 | (c >> 6));
                       da[dp++] = (byte)(0x80 | (c & 0x3f));
                   } else if (Character.isSurrogate(c)) {
                       if (sgp == null)
                           sgp = new Surrogate.Parser();
                       int uc = sgp.parse(c, sa, sp - 1, sl);
                       if (uc < 0) {
                           if (malformedInputAction() != 
CodingErrorAction.REPLACE)
                               return -1;
                           da[dp++] = repl;
                       } else {
                           da[dp++] = (byte)(0xf0 | ((uc >> 18)));
                           da[dp++] = (byte)(0x80 | ((uc >> 12) & 0x3f));
                           da[dp++] = (byte)(0x80 | ((uc >>  6) & 0x3f));
                           da[dp++] = (byte)(0x80 | (uc & 0x3f));
                           sp++;  // 2 chars
                       }
                   } else {
                       // 3 bytes, 16 bits
                       da[dp++] = (byte)(0xe0 | ((c >> 12)));
                       da[dp++] = (byte)(0x80 | ((c >>  6) & 0x3f));
                       da[dp++] = (byte)(0x80 | (c & 0x3f));
                   }
               }
               return dp;
           }
   
   ```
   
   
   
   This implementation is not effficient enough, we need a faster one.
   
   ## Describe the solution you'd like
   
   
   ## Additional context
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to