pjfanning commented on PR #2637: URL: https://github.com/apache/drill/pull/2637#issuecomment-1240555174
With jackson - JSON spec (https://www.ietf.org/rfc/rfc4627.txt) mandates unicode with utf-8 as default. XML mandates utf-8 as default. Quite rare in my experience to see other Unicode charsets used. Utf-8 encoding should use fewer bytes for Latin alphabet based text and numeric data. Java strings can now use utf-16 internally. I'm not sure if there is a performance impact using utf-16 instead of utf-8 (https://www.dariawan.com/tutorials/java/java-9-compact-string-and-string-new-methods/). My main concern is correctness and testability as opposed to performance. Choosing one encoding for externally facing data and another internally would introduce a lot of extra complexity and possibly confusion as to which to choose in certain scenarios - and possibly lower performance as you would often need to convert between the 2 encodings. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org