[ https://issues.apache.org/jira/browse/CALCITE-2619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16647315#comment-16647315 ]
Ted Xu edited comment on CALCITE-2619 at 10/12/18 3:03 AM: ----------------------------------------------------------- As for the cost distribution, I did a quick test (string concat of random 10 characters): ||Name||CPU time||Invocations|| |org.apache.calcite.sql.SqlUtil.translateCharacterSetName(String)|10.7s (0.1%)|16,089| |java.nio.charset.CharsetEncoder.encode(java.nio.CharBuffer, java.nio.ByteBuffer, boolean)|1.374s (7.1%)|16,089 | Charset.forName has its own cache so the cost can be ignored. As for the improvements mentioned above: # Caching values been checked: we've considered the exact way, but looking up a string value from cache is still very expensive, not to mention the memory overhead of the cache. # Skip common charset verification: [~julianhyde] can you elaborate more about this one? However, in CJK (China, Japan, Korea) countries UTF-8 is commonly adopted. We use UTF-8 as our default charset. # Skip copying verification: copy of NlsString changes the value, skip verification is still unsafe. was (Author: tedxu): As for the cost distribution, I did a quick test: ||Name||CPU time||Invocations|| |org.apache.calcite.sql.SqlUtil.translateCharacterSetName(String)|10.7s (0.1%)|16,089| |java.nio.charset.CharsetEncoder.encode(java.nio.CharBuffer, java.nio.ByteBuffer, boolean)|1.374s (7.1%)|16,089 | Charset.forName has its own cache so the cost can be ignored. As for the improvements mentioned above: # Caching values been checked: we've considered the exact way, but looking up a string value from cache is still very expensive, not to mention the memory overhead of the cache. # Skip common charset verification: [~julianhyde] can you elaborate more about this one? However, in CJK (China, Japan, Korea) countries UTF-8 is commonly adopted. We use UTF-8 as our default charset. # Skip copying verification: copy of NlsString changes the value, skip verification is still unsafe. > Reduce string literal creation cost by removing charset check > ------------------------------------------------------------- > > Key: CALCITE-2619 > URL: https://issues.apache.org/jira/browse/CALCITE-2619 > Project: Calcite > Issue Type: Improvement > Components: core > Reporter: Ted Xu > Assignee: Ted Xu > Priority: Major > > The cost of creating NlsString is very high, due to its charset check. In > some cases, e.g., expression evaluate because of Partition Prune, the > NlsString creation costs 40%+ of total executor's overhead. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)