Github user zellerh commented on a diff in the pull request:
https://github.com/apache/incubator-trafodion/pull/257#discussion_r49499008
--- Diff: core/sql/exp/exp_conv.cpp ---
@@ -9321,6 +9321,33 @@ convDoIt(char * source,
};
break;
+// gb2312 -> utf8
+ case CONV_GBK_F_UTF8_V:
+ {
+ char * targetbuf = new char[sourceLen*4+1];
--- End diff --
This should call CharInfo::getMaxConvertedLenInBytes() in file
core/sql/common/charinfo.h. This class already knows about GBK. Also, maybe we
could add an optimization? An ASCII character in GBK maps into a single-byte
character in UTF-8. As far as I can tell, GBK does not go beyond the Unicode
Basic Multi-lingual Plane (BMP), meaning that a 2 byte GBK sequence makes at
most a 3 byte Unicode character. So, the target length is 3/2 * source length,
rounded up to the nearest integer. If targetLen is sufficient for that
conversion, maybe we can avoid allocating a buffer and write directly into
target? Doing a new and a delete is expensive in this performance-critical
context.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---