Copilot commented on code in PR #63438:
URL: https://github.com/apache/doris/pull/63438#discussion_r3272244635
##########
be/test/storage/segment/column_meta_accessor_test.cpp:
##########
@@ -45,6 +46,19 @@ std::string make_test_file_path(const std::string&
file_name) {
return std::string(kTestDir) + "/" + file_name;
}
+TabletColumnPtr create_row_store_test_column(int32_t id) {
+ auto column = std::make_shared<TabletColumn>();
+ column->_unique_id = id;
+ column->_col_name = BeConsts::ROW_STORE_COL;
+ column->_type = FieldType::OLAP_FIELD_TYPE_STRING;
+ column->_is_key = false;
+ column->_is_nullable = true;
+ column->_aggregation = FieldAggregationMethod::OLAP_FIELD_AGGREGATION_NONE;
+ column->_length = 2147483643;
+ column->_index_length = 4;
+ return column;
Review Comment:
`create_row_store_test_column()` sets `TabletColumn` internals directly and
hard-codes the string length as `2147483643`. Using
`set_unique_id/set_name/set_type/...` avoids bypassing invariants like
`_col_name_lower_case`, and replacing the magic number with a named constant
(e.g., derived from `OLAP_STRING_MAX_LENGTH` in `storage/olap_define.h`) makes
the intent and bounds clearer.
##########
be/test/storage/segment/column_meta_accessor_test.cpp:
##########
@@ -677,6 +691,63 @@ TEST(ColumnMetaAccessorTest,
FooterSizeWithManyColumnsExternalVsInline) {
EXPECT_LT(external_footer_size, inline_footer_size / 10);
}
+TEST(ColumnMetaAccessorTest, RowStoreColumnDoesNotUseDictEncoding) {
+ constexpr int32_t kRowStoreUid = 1;
+
+ auto fs = io::global_local_filesystem();
+ static_cast<void>(fs->delete_directory(kTestDir));
+ ASSERT_TRUE(fs->create_directory(kTestDir).ok());
+
+ auto key_column = std::make_shared<TabletColumn>();
+ key_column->_unique_id = 0;
+ key_column->_col_name = "k0";
+ key_column->_type = FieldType::OLAP_FIELD_TYPE_INT;
+ key_column->_is_key = true;
+ key_column->_is_nullable = false;
+ key_column->_length = 4;
+ key_column->_index_length = 4;
Review Comment:
This test initializes `key_column` by writing to `TabletColumn` underscored
members directly. Prefer the public setters (`set_unique_id`, `set_name`,
`set_type`, `set_is_key`, etc.) to keep internal derived fields consistent and
reduce brittleness if `TabletColumn` changes.
##########
be/src/storage/segment/vertical_segment_writer.cpp:
##########
@@ -294,6 +294,12 @@ Status
VerticalSegmentWriter::_create_column_writer(uint32_t cid, const TabletCo
auto page_size = _tablet_schema->row_store_page_size();
opts.data_page_size =
(page_size > 0) ? page_size :
segment_v2::ROW_STORE_PAGE_SIZE_DEFAULT_VALUE;
+ // Row store data is already serialized as a single blob. Keep it on
plain pages
+ // to avoid introducing dictionary pages for the hidden row store
column.
+
opts.meta->set_encoding(_tablet_schema->binary_plain_encoding_default_impl() ==
Review Comment:
Row-store encoding selection uses
`_tablet_schema->binary_plain_encoding_default_impl()` even though
`_create_column_writer` already receives `tablet_schema` as a parameter. Using
`tablet_schema->binary_plain_encoding_default_impl()` makes the dependency
explicit and avoids mismatches if the member schema and the parameter ever
diverge.
##########
be/src/storage/segment/segment_writer.cpp:
##########
@@ -295,6 +295,12 @@ Status SegmentWriter::_create_column_writer(uint32_t cid,
const TabletColumn& co
auto page_size = _tablet_schema->row_store_page_size();
opts.data_page_size =
(page_size > 0) ? page_size :
segment_v2::ROW_STORE_PAGE_SIZE_DEFAULT_VALUE;
+ // Row store data is already serialized as a single blob. Keep it on
plain pages
+ // to avoid introducing dictionary pages for the hidden row store
column.
+
opts.meta->set_encoding(_tablet_schema->binary_plain_encoding_default_impl() ==
Review Comment:
Row-store encoding selection uses
`_tablet_schema->binary_plain_encoding_default_impl()` even though this
function already receives the relevant `schema` parameter. Using
`schema->binary_plain_encoding_default_impl()` avoids accidental divergence if
`_tablet_schema` and the passed schema ever differ (e.g., future refactors /
specialized writers).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]