zclllyybb commented on code in PR #55719:
URL: https://github.com/apache/doris/pull/55719#discussion_r2337479143
##########
be/src/exec/rowid_fetcher.cpp:
##########
@@ -900,6 +900,8 @@ Status RowIdStorageReader::read_batch_external_row(
// Insert the read data into result_block.
for (size_t column_id = 0; column_id < result_block.get_columns().size();
column_id++) {
+ // The non-const Block(result_block) is passed in read_by_rowids, but
columns[i] in get_columns
Review Comment:
Block和Column表示的就是不可变语义,这个函数上游改成MutableBlock吧,调这个函数写完数据可以to_block再做后续的操作。
##########
be/src/runtime/workload_management/resource_context.h:
##########
@@ -80,7 +80,11 @@ class ResourceContext : public
std::enable_shared_from_this<ResourceContext> {
}
void set_workload_group(WorkloadGroupPtr wg) { _workload_group = wg; }
- RuntimeProfile* profile() { return
const_cast<RuntimeProfile*>(resource_profile_.get().get()); }
+ RuntimeProfile* profile() {
Review Comment:
where used this function?
##########
be/src/vec/common/hash_table/hash_map_context.h:
##########
@@ -568,6 +568,8 @@ struct MethodKeysFixed : public MethodBase<TData> {
if (is_column_nullable(*key_columns[i])) {
auto& nullable_col =
assert_cast<ColumnNullable&>(*key_columns[i]);
+ // nullable_col is obtained via key_columns and is itself a
mutable element. However, when accessed
Review Comment:
给get_raw_data留个todo吧,未来换成Slice就好了。
##########
be/src/vec/exec/format/parquet/byte_stream_split_decoder.cpp:
##########
@@ -51,6 +51,8 @@ Status
ByteStreamSplitDecoder::_decode_values(MutableColumnPtr& doris_column,
size_t scale_size = (select_vector.num_values() -
select_vector.num_filtered()) *
(_type_length / primitive_length);
doris_column->resize(doris_column->size() + scale_size);
+ // doris_column is of type MutableColumnPtr, which uses get_raw_data
Review Comment:
记得在 IColumn里面加个TODO,我们应该支持一个返回Slice的非const版本get_raw_data
##########
be/src/vec/functions/function_conv.cpp:
##########
@@ -214,7 +214,7 @@ struct ConvStringImpl {
if (!MathFunctions::handle_parse_result(dst_base, &decimal_num,
parse_res)) {
result_column->insert_data("0", 1);
} else {
- StringRef str_base = MathFunctions::decimal_to_base(context,
decimal_num, dst_base);
+ Slice str_base = MathFunctions::decimal_to_base(context,
decimal_num, dst_base);
result_column->insert_data(reinterpret_cast<const
char*>(str_base.data), str_base.size);
Review Comment:
这两处修改点下面的reinterpret_cast是不是也可以去掉了?
##########
be/src/io/fs/s3_common.h:
##########
@@ -27,6 +27,8 @@ namespace doris {
//
https://stackoverflow.com/questions/13059091/creating-an-input-stream-from-constant-memory
class StringViewStream : Aws::Utils::Stream::PreallocatedStreamBuf, public
std::iostream {
public:
+ // a class for std::string_view, PreallocatedStreamBuf use unsigned char*
+ // so use const_cast
Review Comment:
这个注释不够充分,const_cast合理使用的前提是当前类不会修改buf指向的内存。建议给类型直接改个名字,例如`StringViewOutStream`
##########
be/src/olap/rowset/segment_v2/column_reader.cpp:
##########
@@ -1329,6 +1329,8 @@ Status FileColumnIterator::next_batch(size_t* n,
vectorized::MutableColumnPtr& d
const auto* null_col =
vectorized::check_and_get_column<vectorized::ColumnNullable>(dst.get());
if (null_col != nullptr) {
+ // check_and_get_column returns const T*, dst is an
object that can be modified,
Review Comment:
这个症结在于上面 `null_col` 是个const
Col*,把上面存成ColumnNullable::MutablePtr然后就能get出来Col*了,这里const_cast是可以去掉的
##########
be/src/geo/geo_types.cpp:
##########
@@ -780,6 +780,8 @@ int GeoLine::numPoint() const {
}
S2Point* GeoLine::getPoint(int i) const {
+ // A third-party dependency is used that returns a const S2Point&,
Review Comment:
all caller using this function's return value accept const. so could just
return `const S2Point&` ?
##########
be/src/olap/rowset/segment_v2/inverted_index_fs_directory.cpp:
##########
@@ -856,6 +856,8 @@ lucene::store::IndexOutput*
DorisRAMFSDirectory::createOutput(const char* name)
// get the actual pointer to the output name
char* n = nullptr;
+ // The `name` variable remains unmodified;
Review Comment:
看起来`filesMap`是可以把key加上const的?因为上面char* n可以变成const char*
n,这似乎是唯一一个写入filesMap的。
##########
be/src/util/simd/vstring_function.h:
##########
@@ -218,12 +218,14 @@ class VStringFunctions {
int64_t result_end = dst.size - 1;
// auto SIMD here
+ // dst is a modifiable std::String. Since StringRef is used for
passing, const_cast must be used.
Review Comment:
这函数似乎就在ReverseImpl里面用了,要不直接把第二个参数去掉,返回一个std::string得了
##########
be/src/util/bitmap_expr_calculation.h:
##########
@@ -57,6 +57,12 @@ class BitmapExprCalculation : public
BitmapIntersect<std::string> {
}
}
+ BitmapValue bitmap_calculate() const {
+ // to use a non-const function and no modify, use const_cast is
acceptable
+ // add a const way beause std::map does not have a const operator[]
Review Comment:
operator[]不是const的,那可以用.at()替换么?
##########
be/src/util/uid_util.cpp:
##########
@@ -59,26 +59,24 @@ std::string print_id(const PUniqueId& id) {
static_cast<uint64_t>(id.lo()));
}
-bool parse_id(const std::string& s, TUniqueId* id) {
+bool parse_id(std::string& s, TUniqueId* id) {
DCHECK(id != nullptr);
- const char* hi_part = s.c_str();
- char* colon = const_cast<char*>(strchr(hi_part, '-'));
-
- if (colon == nullptr) {
+ std::size_t pos = s.find('-');
Review Comment:
直接改写的话,我们加一些单测吧,否则容易引入问题,可以用gpt辅助
##########
be/src/vec/columns/column_variant.cpp:
##########
@@ -1428,6 +1428,9 @@ const ColumnVariant::Subcolumn*
ColumnVariant::get_subcolumn_with_cache(const Pa
}
ColumnVariant::Subcolumn* ColumnVariant::get_subcolumn(const PathInData& key,
size_t key_index) {
+ // Since the cache stores const types, non-const versions cannot be used.
const_cast must be employed to
+ // eliminate const semantics. As all nodes are created via
std::make_shared<Node>, modifying them will
+ // not result in uninitialized behavior
Review Comment:
这个非const函数应该可以去掉const_cast,subcolumns是个非const成员,最深层的find_leaf是有const/non-const两个版本的,你可以在这里手动实现一个非const版本的get_subcolumn_with_cache
##########
be/src/vec/columns/column_array.cpp:
##########
@@ -293,6 +293,8 @@ size_t ColumnArray::get_max_row_byte_size() const {
void ColumnArray::serialize_vec(StringRef* keys, size_t num_rows) const {
for (size_t i = 0; i < num_rows; ++i) {
+ // Used in hash_map_context.h, this address is allocated via Arena,
Review Comment:
加个TODO到基类的serialize_vec吧,keys将来考虑用Slice替换一下
##########
be/src/runtime/process_profile.h:
##########
@@ -43,6 +43,8 @@ class ProcessProfile {
std::string print_process_profile_no_root() const {
std::stringstream ss;
std::vector<RuntimeProfile*> profiles;
+ // The `process_profile` is generated via `make_unique` and stored in
`MultiVersion`.
+ // Since it uses `const T` internally, using `const_cast` is valid.
auto version_ptr = _process_profile.get();
auto* process_profile =
const_cast<doris::RuntimeProfile*>(version_ptr.get());
Review Comment:
can we make `get_children` below `const`?
##########
be/src/vec/sink/vtablet_block_convertor.cpp:
##########
@@ -290,6 +290,7 @@ Status OlapTableBlockConvertor::_internal_validate_column(
break;
}
case TYPE_DECIMALV2: {
+ // column_decimal utilizes the ColumnPtr from the block* block in
_validate_data and can be modified.
Review Comment:
直接给validate_and_convert_block加个FIXME注释吧,应该传入MutableBlock来的,因为可能发生数据重整。
##########
be/src/util/bitmap_value.h:
##########
@@ -2431,12 +2432,12 @@ class BitmapValue {
BITMAP = 2, // more than one elements
SET = 3 // elements count less or equal than 32
};
- uint64_t _sv = 0; // store the single value
when _type == SINGLE
- std::shared_ptr<detail::Roaring64Map> _bitmap; // used when _type == BITMAP
+ uint64_t _sv = 0; // store the single value when _type == SINGLE
+ mutable std::shared_ptr<detail::Roaring64Map> _bitmap; // used when _type
== BITMAP
Review Comment:
为什么要改成mutable?如果只是对const 当前类型修改,得优先想别的办法。mutable只适用于某些特定情形,得详细说明下
##########
be/src/vec/core/block.cpp:
##########
@@ -107,6 +107,7 @@ Block::Block(const std::vector<SlotDescriptor>& slots,
size_t block_size,
bool ignore_trivial_slot) {
std::vector<SlotDescriptor*> slot_ptrs(slots.size());
for (size_t i = 0; i < slots.size(); ++i) {
+ // Slots remain unmodified and are used to read column information;
const_cast can be employed.
Review Comment:
这个构造函数真的有用到么?似乎调用的第一个参数都是非const的
##########
be/src/vec/data_types/serde/data_type_bitmap_serde.cpp:
##########
@@ -111,12 +111,12 @@ void DataTypeBitMapSerDe::write_one_cell_to_jsonb(const
IColumn& column, JsonbWr
int64_t row_num) const {
const auto& data_column = assert_cast<const ColumnBitmap&>(column);
result.writeKey(cast_set<JsonbKeyValue::keyid_type>(col_id));
- auto bitmap_value =
const_cast<BitmapValue&>(data_column.get_element(row_num));
+ auto bitmap_value = data_column.get_element(row_num);
// serialize the content of string
auto size = bitmap_value.getSizeInBytes();
// serialize the content of string
- auto* ptr = arena.alloc(size);
- bitmap_value.write_to(const_cast<char*>(ptr));
+ void* ptr = arena.alloc(size);
Review Comment:
这个alloc出来不是本来就是char*么?
##########
be/src/vec/exec/format/orc/vorc_reader.cpp:
##########
@@ -1680,6 +1682,8 @@ Status OrcReader::_fill_doris_array_offsets(const
std::string& col_name,
size_t num_values, size_t*
element_size) {
SCOPED_RAW_TIMER(&_statistics.decode_value_time);
if (num_values > 0) {
+ // The const variable uses a non-const method from a third-party
dependency
+ // without modification, so const_cast can be used.
Review Comment:
你可以之后给doris-thirdparty这个库的apache-orc提个pr,把这个size()改成const的
##########
be/src/vec/functions/function_bitmap.cpp:
##########
@@ -905,6 +905,7 @@ struct BitmapHasAny {
static void vector_vector(const TData& lvec, const TData& rvec, ResTData&
res) {
Review Comment:
`const TData&` to `TData`
##########
be/src/vec/exec/format/parquet/vparquet_group_reader.cpp:
##########
@@ -122,6 +122,7 @@ Status RowGroupReader::init(
for (const auto& read_table_col : _read_table_columns) {
auto read_file_col =
_table_info_node_ptr->children_file_column_name(read_table_col);
+ // The schema is obtained from vparquet_reader, whose value is derived
from parse_thrift_footer, which is a mutable object.
auto* field =
const_cast<FieldSchema*>(schema.get_column(read_file_col));
Review Comment:
get_column可以去掉返回值的const吧?
##########
be/src/vec/data_types/data_type_factory.cpp:
##########
@@ -470,6 +470,8 @@ DataTypePtr DataTypeFactory::create_data_type(const
PrimitiveType primitive_type
// Just Mock A NULL Type in Vec Exec Engine
case TYPE_NULL:
nested = std::make_shared<vectorized::DataTypeUInt8>();
+ // nested is of type DataTypePtr(std::shared_ptr<const IDataType>).
Since it is created
Review Comment:
这个先不给nested赋值,就是非const的sptr,调完set_null_literal再给nested赋值过去就行了吧?
##########
be/src/util/simd/vstring_function.h:
##########
@@ -218,12 +218,14 @@ class VStringFunctions {
int64_t result_end = dst.size - 1;
// auto SIMD here
+ // dst is a modifiable std::String. Since StringRef is used for
passing, const_cast must be used.
Review Comment:
然后可以留个TODO标记,其实更好的做法是直接把reverse的结果往res_data里面写,不过重构pr不做这么大改动了
##########
be/src/vec/exec/format/parquet/vparquet_column_reader.cpp:
##########
@@ -406,6 +408,8 @@ Status ScalarColumnReader::_read_nested_column(ColumnPtr&
doris_column, DataType
NullMap* map_data_column = nullptr;
if (doris_column->is_nullable()) {
SCOPED_RAW_TIMER(&_decode_null_map_time);
+ // doris_column either originates from a mutable block in
vparquet_group_reader
Review Comment:
这个函数的实际调用点只有一个,让它上游read_column_data调用的_converter->get_physical_column这个函数返回值改成MutablePtr是不是就行了
##########
be/src/vec/exec/format/csv/csv_reader.cpp:
##########
@@ -642,6 +642,8 @@ Status CsvReader::_fill_dest_columns(const Slice& line,
Block* block,
IColumn* col_ptr = columns[i].get();
if (!_is_load) {
+ // block is a Block*, and get_by_position returns a ColumnPtr,
Review Comment:
这里最根本的点是,GenericReader的get_next_block入参应该是个MutableBlock才对,给那个地方加个FIXME吧
##########
be/src/vec/data_types/data_type_variant.cpp:
##########
@@ -67,6 +67,7 @@ int64_t
DataTypeVariant::get_uncompressed_serialized_bytes(const IColumn& column
int
be_exec_version) const {
const auto& column_variant = assert_cast<const ColumnVariant&>(column);
if (!column_variant.is_finalized()) {
+ // Icolumn originates from MutablePtr or block, and therefore can be
modified.
Review Comment:
给Block::serialize加个TODO吧,这块儿逻辑得重新思考下,这里可能会调用finalize太trick了
##########
be/src/vec/columns/column_variant.cpp:
##########
@@ -1428,6 +1428,9 @@ const ColumnVariant::Subcolumn*
ColumnVariant::get_subcolumn_with_cache(const Pa
}
ColumnVariant::Subcolumn* ColumnVariant::get_subcolumn(const PathInData& key,
size_t key_index) {
+ // Since the cache stores const types, non-const versions cannot be used.
const_cast must be employed to
+ // eliminate const semantics. As all nodes are created via
std::make_shared<Node>, modifying them will
+ // not result in uninitialized behavior
Review Comment:
哦,find_leaf的non-const版本是你刚实现的,那这里是不是打算改但忘改了?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]