This is an automated email from the ASF dual-hosted git repository.
yiguolei pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git
The following commit(s) were added to refs/heads/master by this push:
new 469c8b7ece [Fix](JSON LOAD)fix json load issue when string conform
with RFC 4627 #21390
469c8b7ece is described below
commit 469c8b7ece427302a9cd824ffccde88389093279
Author: GoGoWen <[email protected]>
AuthorDate: Sun Jul 9 17:16:03 2023 +0800
[Fix](JSON LOAD)fix json load issue when string conform with RFC 4627 #21390
should set: enable_simdjson_reader=false in master as master
enable_simdjson_reader=true by default.
Issue Number: close #21389
from rapidjson:
Query String
In addition to GetString(), the Value class also contains
GetStringLength(). Here explains why:
According to RFC 4627, JSON strings can contain Unicode character U+0000,
which must be escaped as "\u0000". The problem is that, C/C++ often uses
null-terminated string, which treats \0 as the terminator symbol.
To conform with RFC 4627, RapidJSON supports string containing U+0000
character. If you need to handle this, you can use GetStringLength() to obtain
the correct string length.
For example, after parsing the following JSON to Document d:
{ "s" : "a\u0000b" }
The correct length of the string "a\u0000b" is 3, as returned by
GetStringLength(). But strlen() returns 1.
GetStringLength() can also improve performance, as user may often need to
call strlen() for allocating buffer.
Besides, std::string also support a constructor:
string(const char* s, size_t count);
which accepts the length of string as parameter. This constructor supports
storing null character within the string, and should also provide better
performance.
---
be/src/vec/exec/format/json/new_json_reader.cpp | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/be/src/vec/exec/format/json/new_json_reader.cpp
b/be/src/vec/exec/format/json/new_json_reader.cpp
index 157b8a63e9..f6eabaa7cd 100644
--- a/be/src/vec/exec/format/json/new_json_reader.cpp
+++ b/be/src/vec/exec/format/json/new_json_reader.cpp
@@ -889,7 +889,7 @@ Status
NewJsonReader::_write_data_to_column(rapidjson::Value::ConstValueIterator
switch (value->GetType()) {
case rapidjson::Type::kStringType:
str_value = value->GetString();
- wbytes = strlen(str_value);
+ wbytes = value->GetStringLength();
break;
case rapidjson::Type::kNumberType:
if (value->IsUint()) {
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]