Copilot commented on code in PR #63438:
URL: https://github.com/apache/doris/pull/63438#discussion_r3272244635


##########
be/test/storage/segment/column_meta_accessor_test.cpp:
##########
@@ -45,6 +46,19 @@ std::string make_test_file_path(const std::string& 
file_name) {
     return std::string(kTestDir) + "/" + file_name;
 }
 
+TabletColumnPtr create_row_store_test_column(int32_t id) {
+    auto column = std::make_shared<TabletColumn>();
+    column->_unique_id = id;
+    column->_col_name = BeConsts::ROW_STORE_COL;
+    column->_type = FieldType::OLAP_FIELD_TYPE_STRING;
+    column->_is_key = false;
+    column->_is_nullable = true;
+    column->_aggregation = FieldAggregationMethod::OLAP_FIELD_AGGREGATION_NONE;
+    column->_length = 2147483643;
+    column->_index_length = 4;
+    return column;

Review Comment:
   `create_row_store_test_column()` sets `TabletColumn` internals directly and 
hard-codes the string length as `2147483643`. Using 
`set_unique_id/set_name/set_type/...` avoids bypassing invariants like 
`_col_name_lower_case`, and replacing the magic number with a named constant 
(e.g., derived from `OLAP_STRING_MAX_LENGTH` in `storage/olap_define.h`) makes 
the intent and bounds clearer.



##########
be/test/storage/segment/column_meta_accessor_test.cpp:
##########
@@ -677,6 +691,63 @@ TEST(ColumnMetaAccessorTest, 
FooterSizeWithManyColumnsExternalVsInline) {
     EXPECT_LT(external_footer_size, inline_footer_size / 10);
 }
 
+TEST(ColumnMetaAccessorTest, RowStoreColumnDoesNotUseDictEncoding) {
+    constexpr int32_t kRowStoreUid = 1;
+
+    auto fs = io::global_local_filesystem();
+    static_cast<void>(fs->delete_directory(kTestDir));
+    ASSERT_TRUE(fs->create_directory(kTestDir).ok());
+
+    auto key_column = std::make_shared<TabletColumn>();
+    key_column->_unique_id = 0;
+    key_column->_col_name = "k0";
+    key_column->_type = FieldType::OLAP_FIELD_TYPE_INT;
+    key_column->_is_key = true;
+    key_column->_is_nullable = false;
+    key_column->_length = 4;
+    key_column->_index_length = 4;

Review Comment:
   This test initializes `key_column` by writing to `TabletColumn` underscored 
members directly. Prefer the public setters (`set_unique_id`, `set_name`, 
`set_type`, `set_is_key`, etc.) to keep internal derived fields consistent and 
reduce brittleness if `TabletColumn` changes.
   



##########
be/src/storage/segment/vertical_segment_writer.cpp:
##########
@@ -294,6 +294,12 @@ Status 
VerticalSegmentWriter::_create_column_writer(uint32_t cid, const TabletCo
         auto page_size = _tablet_schema->row_store_page_size();
         opts.data_page_size =
                 (page_size > 0) ? page_size : 
segment_v2::ROW_STORE_PAGE_SIZE_DEFAULT_VALUE;
+        // Row store data is already serialized as a single blob. Keep it on 
plain pages
+        // to avoid introducing dictionary pages for the hidden row store 
column.
+        
opts.meta->set_encoding(_tablet_schema->binary_plain_encoding_default_impl() ==

Review Comment:
   Row-store encoding selection uses 
`_tablet_schema->binary_plain_encoding_default_impl()` even though 
`_create_column_writer` already receives `tablet_schema` as a parameter. Using 
`tablet_schema->binary_plain_encoding_default_impl()` makes the dependency 
explicit and avoids mismatches if the member schema and the parameter ever 
diverge.
   



##########
be/src/storage/segment/segment_writer.cpp:
##########
@@ -295,6 +295,12 @@ Status SegmentWriter::_create_column_writer(uint32_t cid, 
const TabletColumn& co
         auto page_size = _tablet_schema->row_store_page_size();
         opts.data_page_size =
                 (page_size > 0) ? page_size : 
segment_v2::ROW_STORE_PAGE_SIZE_DEFAULT_VALUE;
+        // Row store data is already serialized as a single blob. Keep it on 
plain pages
+        // to avoid introducing dictionary pages for the hidden row store 
column.
+        
opts.meta->set_encoding(_tablet_schema->binary_plain_encoding_default_impl() ==

Review Comment:
   Row-store encoding selection uses 
`_tablet_schema->binary_plain_encoding_default_impl()` even though this 
function already receives the relevant `schema` parameter. Using 
`schema->binary_plain_encoding_default_impl()` avoids accidental divergence if 
`_tablet_schema` and the passed schema ever differ (e.g., future refactors / 
specialized writers).
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to