[
https://issues.apache.org/jira/browse/TRAFODION-3265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
He Zhenxing updated TRAFODION-3265:
-----------------------------------
Description:
When inserting values to varchar column of UTF8 character set, and the values
are a mixture of alphabet and Chinese (or other none-ISO8859-1 characters), the
performance will degrade exponentially according to the number of alphabet
characters.
{quote}>> CREATE TABLE t1 (a VARCHAR(65535) CHARACTER SET UTF8);
>> INSERT INTO t1 VALUES ('aaa...中文');
{quote}
Here is a simple test result with different number of 'a's in the insert value:
100 'a's 10 ms
1k 'a's 100 ms
10k 'a's 10 seconds
60k 'a's 160 seconds
The reason is because we internally convert UTF8 to UCS2, which means all 'a's
will become "a\0"s, and then ConstValue::getText() function will try to replace
all '\0' to "
0', and thus cause the performance degrade exponentially with the number of
'a's in the string.
{quote}const NAString ConstValue::getText() const
{
if(getType()->getTypeQualifier() == NA_CHARACTER_TYPE)
{
...
// Change imbedded NULL and \377 chars to \0 and \377
// This comes up in key values quite often.
size_t index;
while((index = result.first('\0')) != NA_NPOS
&& index != result.length())
result(index,1) = "\\0";
while((index = result.first('\377')) != NA_NPOS
&& index != result.length())
result(index,1) = "\\377";
...
}
{quote}
was:
When inserting values to varchar column of UTF8 character set, and the values
are a mixture of alphabet and Chinese (or other none-ISO8859-1 characters), the
performance will degrade exponentially according to the number of alphabet
characters.
{quote}>> CREATE TABLE t1 (a VARCHAR(65535) CHARACTER SET UTF8);
>> INSERT INTO t1 VALUES ('aaa...中文');
{quote}
Here is a simple test result with different number of 'a's in the insert value:
100 'a's 10 ms
1k 'a's 100 ms
10k 'a's 10 seconds
60k 'a's 160 seconds
The reason is because we internally convert UTF8 to UCS2, which means all 'a's
will become "a\0"s, and then ConstValue::getText() function will try to replace
all '\0' to "\\0', and thus cause the performance degrade exponentially with
the number of 'a's in the string.
{quote}{{const NAString ConstValue::getText() const}}
{{{}}
{{ if(getType()->getTypeQualifier() == NA_CHARACTER_TYPE)}}
{{ {}}
{{ NAString result(CmpCommon::statementHeap());}}
{{ if (!textIsValidatedSQLLiteralInUTF8_)}}
{{ result += "\'";}}
{{ if (text_) result += *text_;}}
{{ if (!textIsValidatedSQLLiteralInUTF8_)}}
{{ result += "\'";}}{{// Change imbedded NULL and \377 chars to \0 and \377}}
{{ // This comes up in key values quite often.}}
{{ size_t index;}}
{{ while((index = result.first('\0')) != NA_NPOS}}
{{ && index != result.length())}}
{{ result(index,1) = "\\0";}}
{{ while((index = result.first('\377')) != NA_NPOS}}
{{ && index != result.length())}}
{{ result(index,1) = "\\377";}}{{return result;}}
{{ }}}
{{ else}}
{{ return *text_;}}
{{}}}
{quote}
> INSERT values to VARCHAR column of UTF8 charset can be extremely slow
> ---------------------------------------------------------------------
>
> Key: TRAFODION-3265
> URL: https://issues.apache.org/jira/browse/TRAFODION-3265
> Project: Apache Trafodion
> Issue Type: Bug
> Components: sql-exe
> Affects Versions: any
> Reporter: He Zhenxing
> Priority: Major
> Fix For: 2.4
>
>
> When inserting values to varchar column of UTF8 character set, and the values
> are a mixture of alphabet and Chinese (or other none-ISO8859-1 characters),
> the performance will degrade exponentially according to the number of
> alphabet characters.
> {quote}>> CREATE TABLE t1 (a VARCHAR(65535) CHARACTER SET UTF8);
> >> INSERT INTO t1 VALUES ('aaa...中文');
> {quote}
> Here is a simple test result with different number of 'a's in the insert
> value:
> 100 'a's 10 ms
> 1k 'a's 100 ms
> 10k 'a's 10 seconds
> 60k 'a's 160 seconds
> The reason is because we internally convert UTF8 to UCS2, which means all
> 'a's will become "a\0"s, and then ConstValue::getText() function will try to
> replace all '\0' to "
> 0', and thus cause the performance degrade exponentially with the number of
> 'a's in the string.
> {quote}const NAString ConstValue::getText() const
> {
> if(getType()->getTypeQualifier() == NA_CHARACTER_TYPE)
> {
> ...
> // Change imbedded NULL and \377 chars to \0 and \377
> // This comes up in key values quite often.
> size_t index;
> while((index = result.first('\0')) != NA_NPOS
> && index != result.length())
> result(index,1) = "\\0";
> while((index = result.first('\377')) != NA_NPOS
> && index != result.length())
> result(index,1) = "\\377";
> ...
> }
> {quote}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)