Github user zellerh commented on a diff in the pull request:

    https://github.com/apache/incubator-trafodion/pull/257#discussion_r49499008
  
    --- Diff: core/sql/exp/exp_conv.cpp ---
    @@ -9321,6 +9321,33 @@ convDoIt(char * source,
       };
       break;
     
    +// gb2312 -> utf8
    +  case CONV_GBK_F_UTF8_V:
    +  {
    +    char * targetbuf = new char[sourceLen*4+1];
    --- End diff --
    
    This should call CharInfo::getMaxConvertedLenInBytes() in file 
core/sql/common/charinfo.h. This class already knows about GBK. Also, maybe we 
could add an optimization? An ASCII character in GBK maps into a single-byte 
character in UTF-8. As far as I can tell, GBK does not go beyond the Unicode 
Basic Multi-lingual Plane (BMP), meaning that a 2 byte GBK sequence makes at 
most a 3 byte Unicode character. So, the target length is 3/2 * source length, 
rounded up to the nearest integer. If targetLen is sufficient for that 
conversion, maybe we can avoid allocating a buffer and write directly into 
target? Doing a new and a delete is expensive in this performance-critical 
context.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to