Copilot commented on code in PR #64476:
URL: https://github.com/apache/doris/pull/64476#discussion_r3404731797


##########
be/src/format/csv/csv_reader.cpp:
##########
@@ -559,28 +563,31 @@ Status 
CsvReader::get_parsed_schema(std::vector<std::string>* col_names,
 }
 
 Status CsvReader::_deserialize_nullable_string(IColumn& column, Slice& slice) {
-    auto& null_column = assert_cast<ColumnNullable&>(column);
-    if (_empty_field_as_null) {
-        if (slice.size == 0) {
-            null_column.insert_data(nullptr, 0);
-            return Status::OK();
-        }
+    // This is the per-row per-column hot path of CSV load (load reads every 
column as
+    // nullable string). The column type was already verified by the checked 
assert_cast
+    // in _reserve_nullable_string_columns at the beginning of the batch, so 
the casts
+    // here can skip the release-build typeid check.
+    auto& null_column = assert_cast<ColumnNullable&, 
TypeCheckOnRelease::DISABLE>(column);

Review Comment:
   This change bypasses the SerDe layer and relies on unchecked (release-build) 
assert_casts in a very hot and correctness-critical path. There are existing 
CsvReader unit tests, but none that exercise nullable-string CSV cell parsing 
(null_format matching, empty_field_as_null, escape/quote escaping). Adding a 
focused unit/regression test for these cases would help ensure future refactors 
keep semantics identical.



##########
be/src/format/text/text_reader.cpp:
##########
@@ -166,20 +169,23 @@ Status TextReader::_validate_line(const Slice& line, 
bool* success) {
 }
 
 Status TextReader::_deserialize_nullable_string(IColumn& column, Slice& slice) 
{
-    auto& null_column = assert_cast<ColumnNullable&>(column);
+    // Hot path of hive text load, see 
CsvReader::_deserialize_nullable_string. The
+    // column type was verified by the checked assert_cast in
+    // _reserve_nullable_string_columns at the beginning of the batch.
+    auto& null_column = assert_cast<ColumnNullable&, 
TypeCheckOnRelease::DISABLE>(column);
+    auto& string_column = assert_cast<ColumnString&, 
TypeCheckOnRelease::DISABLE>(

Review Comment:
   This inlines hive-text string parsing and uses unchecked (release-build) 
assert_casts. There don’t appear to be TextReader-specific tests covering null 
detection and escape handling for this nullable-string hot path; adding at 
least one targeted test would help guard the semantic equivalence claim.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to