mrhhsg commented on code in PR #63389:
URL: https://github.com/apache/doris/pull/63389#discussion_r3330511558
##########
be/src/common/config.cpp:
##########
@@ -1059,6 +1059,7 @@ DEFINE_mInt64(small_column_size_buffer, "100");
// Perform the always_true check at intervals determined by
runtime_filter_sampling_frequency
DEFINE_mInt32(runtime_filter_sampling_frequency, "32");
+DEFINE_mBool(enable_expr_zonemap_filter, "true");
Review Comment:
已在 cc67a753274 中处理:删除 BE mutable config,新增 FE session variable
`enable_expr_zonemap_filter`,通过 `TQueryOptions` 下发到 BE。OLAP segment/page 和
Parquet row-group/page 路径现在都读取 query option;默认保持 true,测试里也改为通过 `RuntimeState`
控制。
##########
be/src/exec/operator/olap_scan_operator.cpp:
##########
@@ -222,6 +222,20 @@ Status OlapScanLocalState::_init_profile() {
_stats_filtered_counter = ADD_COUNTER(_segment_profile,
"RowsStatsFiltered", TUnit::UNIT);
_stats_rp_filtered_counter =
ADD_COUNTER(_segment_profile,
"RowsZoneMapRuntimePredicateFiltered", TUnit::UNIT);
+ _expr_zonemap_filtered_segment_counter =
+ ADD_COUNTER(_segment_profile, "ExprZoneMapFilteredSegments",
TUnit::UNIT);
+ _expr_zonemap_filtered_page_counter =
+ ADD_COUNTER(_segment_profile, "ExprZoneMapFilteredPages",
TUnit::UNIT);
+ _expr_zonemap_unsupported_counter =
+ ADD_COUNTER(_segment_profile, "ExprZoneMapUnsupportedExprs",
TUnit::UNIT);
+ _expr_zonemap_type_mismatch_counter =
Review Comment:
这里主要是 BE 侧表达式物化值类型和 slot/zonemap 类型不一致时的保守 fallback 统计。例如 slot 是数值列但比较
literal/IN-list 物化后不能和该列类型安全比较,或者字典 rewrite 这类场景让表达式值类型不再等价于原始 slot 类型时,都不能继续用
ZoneMap 做精确裁剪,只能返回 unsupported/may-match。这个 counter 用来观测这类 fallback,不表示正常 SQL
会报错。
##########
be/src/exec/operator/olap_scan_operator.cpp:
##########
@@ -222,6 +222,20 @@ Status OlapScanLocalState::_init_profile() {
_stats_filtered_counter = ADD_COUNTER(_segment_profile,
"RowsStatsFiltered", TUnit::UNIT);
_stats_rp_filtered_counter =
ADD_COUNTER(_segment_profile,
"RowsZoneMapRuntimePredicateFiltered", TUnit::UNIT);
+ _expr_zonemap_filtered_segment_counter =
+ ADD_COUNTER(_segment_profile, "ExprZoneMapFilteredSegments",
TUnit::UNIT);
+ _expr_zonemap_filtered_page_counter =
+ ADD_COUNTER(_segment_profile, "ExprZoneMapFilteredPages",
TUnit::UNIT);
+ _expr_zonemap_unsupported_counter =
+ ADD_COUNTER(_segment_profile, "ExprZoneMapUnsupportedExprs",
TUnit::UNIT);
+ _expr_zonemap_type_mismatch_counter =
+ ADD_COUNTER(_segment_profile, "ExprZoneMapTypeMismatch",
TUnit::UNIT);
+ _in_zonemap_point_check_counter =
+ ADD_COUNTER(_segment_profile, "InZoneMapPointCheckCount",
TUnit::UNIT);
+ _in_zonemap_range_only_counter =
+ ADD_COUNTER(_segment_profile, "InZoneMapRangeOnlyCount",
TUnit::UNIT);
+ _in_zonemap_point_check_skipped_counter =
+ ADD_COUNTER(_segment_profile,
"InZoneMapPointCheckSkippedDueToThreshold", TUnit::UNIT);
Review Comment:
已在 cc67a753274 中处理:删除 `InZoneMapPointCheckSkippedDueToThreshold` 以及对应
stats/profile 字段和 UT 断言。现在大 IN-list 跳过 point check 只统计
`InZoneMapRangeOnlyCount`。
##########
be/src/exec/runtime_filter/runtime_filter_consumer_helper.cpp:
##########
@@ -17,10 +17,14 @@
#include "exec/runtime_filter/runtime_filter_consumer_helper.h"
+#include "common/logging.h"
Review Comment:
已在 cc67a753274 中处理:这部分只是之前补 include 时留下的无关 diff,当前已从 PR
中移除,`runtime_filter_consumer_helper.cpp` 不再有无代码修改的头文件变更。
##########
be/src/exec/runtime_filter/runtime_filter_consumer_helper.h:
##########
@@ -39,6 +39,7 @@ class RuntimeFilterConsumerHelper {
// Called by Operator.
Status acquire_runtime_filter(RuntimeState* state, VExprContextSPtrs&
conjuncts,
const RowDescriptor& row_descriptor);
+
// The un-arrival filters will be checked every time the scanner is
scheduled.
Review Comment:
已在 cc67a753274 中处理:同上,已移除 `runtime_filter_consumer_helper.h` 中无关空行/格式
diff,避免把无代码逻辑变化带进这个 PR。
##########
be/src/exprs/function/function.h:
##########
@@ -226,6 +229,17 @@ class IFunctionBase {
virtual bool can_push_down_to_index() const { return false; }
virtual bool is_blockable() const { return false; }
+
+ virtual bool is_deterministic() const { return false; }
Review Comment:
已在 cc67a753274 中处理:去掉了 `IFunctionBase::is_deterministic()` 这层判断,不再用“确定性”描述
ZoneMap 能力。现在只通过 `can_evaluate_zonemap_filter()` / `evaluate_zonemap_filter()`
表达具体函数是否支持 ZoneMap 推导,避免把该能力和 determinism 混在一起。
##########
be/src/exprs/function/function_string.cpp:
##########
@@ -1341,8 +1342,21 @@ using FunctionCrc32 = FunctionUnaryToType<Crc32Impl,
NameCrc32>;
using FunctionStringUTF8Length = FunctionUnaryToType<StringUtf8LengthImpl,
NameStringUtf8Length>;
using FunctionStringSpace = FunctionUnaryToType<StringSpace, NameStringSpace>;
using FunctionIsValidUTF8 = FunctionUnaryToType<IsValidUTF8Impl,
NameIsValidUTF8>;
-using FunctionStringStartsWith =
- FunctionBinaryToType<DataTypeString, DataTypeString,
StringStartsWithImpl, NameStartsWith>;
+class FunctionStringStartsWith : public FunctionBinaryToType<DataTypeString,
DataTypeString,
+
StringStartsWithImpl, NameStartsWith> {
+public:
+ static FunctionPtr create() { return
std::make_shared<FunctionStringStartsWith>(); }
+ bool is_deterministic() const override { return true; }
+ ZoneMapFilterResult evaluate_zonemap_filter(const ZoneMapEvalContext& ctx,
+ const VExprSPtrs& arguments)
const override {
+ return expr_zonemap::eval_starts_with_zonemap(ctx, arguments);
Review Comment:
这里先保留集中实现,原因是 `starts_with` 与 comparison/IN/null 共用同一套 Field 类型兼容、CHAR range
禁用、`ZoneMapEvalContext` 统计和 conservative fallback 逻辑;函数侧只暴露 can/evaluate
hook,具体 ZoneMap 语义放在 expr_zonemap 里,避免在各个 function
文件里重复维护这些边界。后续如果支持的字符串函数变多,可以再按函数族拆分文件降低单文件复杂度。
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]