pjfanning commented on PR #2637:
URL: https://github.com/apache/drill/pull/2637#issuecomment-1240555174

   With jackson - JSON spec (https://www.ietf.org/rfc/rfc4627.txt) mandates 
unicode with utf-8 as default. XML mandates utf-8 as default. Quite rare in my 
experience to see other Unicode charsets used. Utf-8 encoding should use fewer 
bytes for Latin alphabet based text and numeric data.
   
   Java strings can now use utf-16 internally. I'm not sure if there is a 
performance impact using utf-16 instead of utf-8 
(https://www.dariawan.com/tutorials/java/java-9-compact-string-and-string-new-methods/).
   
   My main concern is correctness and testability as opposed to performance. 
Choosing one encoding for externally facing data and another internally would 
introduce a lot of extra complexity and possibly confusion as to which to 
choose in certain scenarios - and possibly lower performance as you would often 
need to convert between the 2 encodings.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to