github-actions[bot] commented on code in PR #64624:
URL: https://github.com/apache/doris/pull/64624#discussion_r3461985590
##########
fe/fe-core/src/main/java/org/apache/doris/nereids/trees/expressions/functions/agg/Percentile.java:
##########
@@ -78,10 +78,16 @@ private Percentile(NullableAggregateFunctionParams
functionParams) {
@Override
public void checkLegalityBeforeTypeCoercion() {
- if (!getArgument(1).isConstant()) {
+ Expression quantile = getArgument(1);
+ if (!quantile.isConstant()) {
throw new AnalysisException(
"percentile requires second parameter must be a constant :
" + this.toSql());
}
+ double value =
((org.apache.doris.nereids.trees.expressions.literal.Literal)
quantile).getDouble();
Review Comment:
This range check is running before constant folding/type coercion, but it
treats every accepted constant as a numeric `Literal`. `isConstant()` is
broader than `Literal`, so `percentile(k, 0.1 + 0.2)` passes the guard and then
this cast throws `ClassCastException` during analysis. The same pre-coercion
read also breaks explicitly castable string literals: these functions use
`ExplicitlyCastableSignature`, so a call like `percentile(k, '0.5')` can be
matched by casting the second argument to `DOUBLE`, but
`StringLikeLiteral.getDouble()` is not numeric parsing; it byte-encodes the
string for ordering and makes `'0.5'` look far outside `[0, 1]` before coercion
runs. The same pattern was added to `PercentileApprox` and
`PercentileApproxWeighted`. Please validate after folding/coercion, or
explicitly reject unsupported literal types with a controlled compatibility
decision.
##########
be/src/exprs/aggregate/aggregate_function_percentile.h:
##########
@@ -156,6 +158,27 @@ class AggregateFunctionPercentileApproxBase
String get_name() const override { return "percentile_approx"; }
+ const std::vector<size_t>& get_const_argument_indexes() const override {
+ static const std::vector<size_t> two_params {1};
+ static const std::vector<size_t> three_params {1, 2};
+ return this->argument_types.size() == 3 ? three_params : two_params;
+ }
+
+ Status set_const_arguments(const std::vector<ColumnWithTypeAndName>&
arguments) override {
+ _quantile = assert_cast<const ColumnFloat64&,
TypeCheckOnRelease::DISABLE>(
+ assert_cast<const
ColumnConst&>(*arguments[0].column).get_data_column())
+ .get_data()[0];
+ _compression =
+ this->argument_types.size() == 3
Review Comment:
The weighted variants cannot share this arity-based setter as-is.
`AggregateFunctionPercentileApproxWeightedThreeParams::get_const_argument_indexes()`
returns only `{2}`, so this method receives a single captured constant, but
because `argument_types.size() == 3` it still reads `arguments[1]` for
compression. That is out of range for the three-argument weighted form.
Conversely, the four-argument weighted form returns `{2, 3}`, but because its
arity is 4 this code takes the `: 10000` branch and ignores the user-supplied
compression. Please make the setter use the captured argument count or override
it for the weighted subclasses so `percentile_approx_weighted(v, w, q)` and
`percentile_approx_weighted(v, w, q, compression)` both initialize correctly.
##########
be/src/exprs/aggregate/aggregate_function_ai_agg.h:
##########
@@ -274,18 +278,32 @@ class AggregateFunctionAIAgg final
bool is_blockable() const override { return true; }
+ const std::vector<size_t>& get_const_argument_indexes() const override {
+ static const std::vector<size_t> indexes {0, 2};
+ return indexes;
+ }
+
+ Status set_const_arguments(const std::vector<ColumnWithTypeAndName>&
arguments) override {
+ const auto& resource_name_column =
+ assert_cast<const
ColumnConst&>(*arguments[0].column).get_data_column();
+ _resource_name =
Review Comment:
These `StringRef`s point into the temporary columns built by
`AggFnEvaluator::_init_const_arguments()`. That helper stores each
`execute_const_expr()` result in a local `std::vector<ColumnWithTypeAndName>`
and destroys it immediately after `set_const_arguments()` returns, while
`ColumnString::get_data_at()` returns a pointer into the column's `chars`
buffer. After `open()`, `_resource_name` and `_task` are dangling, so the first
`add()` can read freed memory when it calls `prepare(_resource_name, _task)`.
Please copy these constants into owned `std::string` storage on the aggregate
function (or into the state) before keeping them past the setter call.
##########
be/src/exprs/aggregate/aggregate_function_percentile.h:
##########
@@ -693,25 +724,32 @@ class AggregateFunctionPercentileArray final
return
std::make_shared<DataTypeArray>(make_nullable(std::make_shared<DataTypeFloat64>()));
}
+ const std::vector<size_t>& get_const_argument_indexes() const override {
+ static const std::vector<size_t> indexes {1};
+ return indexes;
+ }
+
+ Status set_const_arguments(const std::vector<ColumnWithTypeAndName>&
arguments) override {
+ const auto& quantile_array = assert_cast<const ColumnArray&,
TypeCheckOnRelease::DISABLE>(
+ assert_cast<const
ColumnConst&>(*arguments[0].column).get_data_column());
+ const auto& offset_column_data = quantile_array.get_offsets();
+ const auto& quantile_data =
Review Comment:
This loses the null-element check for constant arrays that are not
`ArrayLiteral`, for example `percentile_array(x, array(0.5, NULL))`. The FE
guard only iterates `ArrayLiteral`, while the scalar `array(...)` function is
still a constant expression. On the BE side this setter unwraps the array's
nullable nested column and copies only `ColumnFloat64` values, ignoring the
nested null map; null elements therefore use their default nested payload,
commonly `0.0`, and pass `check_quantile()`. The old row path explicitly threw
when `null_maps[i]` was set. Please keep the null-map validation when
extracting the cached quantile array.
##########
be/src/exprs/vectorized_agg_fn.cpp:
##########
@@ -349,14 +379,26 @@ Status AggFnEvaluator::_calc_argument_columns(Block*
block) {
SCOPED_TIMER(_expr_timer);
_agg_columns.resize(_input_exprs_ctxs.size());
std::vector<int> column_ids(_input_exprs_ctxs.size());
+ std::vector<int> materialize_column_ids;
+ materialize_column_ids.reserve(_input_exprs_ctxs.size());
for (int i = 0; i < _input_exprs_ctxs.size(); ++i) {
+ if (!_const_argument_idx.empty() && _const_argument_idx[i]) {
+ continue;
+ }
int column_id = -1;
RETURN_IF_ERROR(_input_exprs_ctxs[i]->execute(block, &column_id));
column_ids[i] = column_id;
+ materialize_column_ids.push_back(column_id);
+ }
+ if (!materialize_column_ids.empty()) {
+ materialize_block_inplace(*block, materialize_column_ids.data(),
+ materialize_column_ids.data() +
materialize_column_ids.size());
}
- materialize_block_inplace(*block, column_ids.data(),
- column_ids.data() + _input_exprs_ctxs.size());
for (int i = 0; i < _input_exprs_ctxs.size(); ++i) {
Review Comment:
Dropping the column for required constants loses the nullable-wrapper path
for constant NULLs. Existing tests still call `percentile(level + 0.1, NULL)`
and `percentile_approx(level, NULL, 2048)`. Those aggregate functions are
wrapped by the nullable aggregate wrapper, whose variadic `add()` expects
`columns[i]` to be a `ColumnNullable` whenever `is_nullable[i]` is true and
dereferences it before deciding to skip the row. With this change
`_agg_columns[i]` is `nullptr` for the constant argument, and the new
percentile setters also assume `ColumnConst(ColumnFloat64)` rather than
`ColumnConst(ColumnNullable(...))`. Please preserve nullable const columns for
the wrapper or explicitly encode constant-null semantics in
`set_const_arguments()` so these supported NULL cases still produce NULL
instead of crashing or using a default quantile.
##########
fe/fe-core/src/main/java/org/apache/doris/nereids/trees/expressions/functions/agg/PercentileApprox.java:
##########
@@ -85,15 +85,28 @@ private PercentileApprox(NullableAggregateFunctionParams
functionParams) {
@Override
public void checkLegalityBeforeTypeCoercion() {
- if (!getArgument(1).isConstant()) {
+ Expression quantile = getArgument(1);
+ if (!quantile.isConstant()) {
throw new AnalysisException(
"percentile_approx requires second parameter must be a
constant : " + this.toSql());
}
+ double value =
((org.apache.doris.nereids.trees.expressions.literal.Literal)
quantile).getDouble();
+ if (value < 0.0 || value > 1.0) {
+ throw new AnalysisException(
+ "percentile_approx quantile must be in [0, 1], but got " +
value + ": " + this.toSql());
+ }
if (arity() == 3) {
- if (!getArgument(2).isConstant()) {
+ Expression compressionArg = getArgument(2);
+ if (!compressionArg.isConstant()) {
throw new AnalysisException(
"percentile_approx requires the third parameter must
be a constant : " + this.toSql());
}
+ double compression =
+
((org.apache.doris.nereids.trees.expressions.literal.Literal)
compressionArg).getDouble();
+ if (compression < 2048.0 || compression > 10000.0) {
Review Comment:
This changes existing `percentile_approx` semantics. The BE state still
documents and implements that when compression is not in `[2048, 10000]`, it
falls back to the default `10000`; this PR now rejects those calls in FE before
BE can apply that behavior. That makes previously accepted queries like
`percentile_approx(x, 0.5, 10001)` fail analysis, and the weighted variant has
the same new check. If the intent is only to cache const arguments, please keep
the BE fallback semantics instead of turning out-of-range compression into an
analysis error.
##########
be/src/exprs/vectorized_agg_fn.cpp:
##########
@@ -274,7 +275,36 @@ Status AggFnEvaluator::prepare(RuntimeState* state, const
RowDescriptor& desc,
}
Status AggFnEvaluator::open(RuntimeState* state) {
- return VExpr::open(_input_exprs_ctxs, state);
+ RETURN_IF_ERROR(VExpr::open(_input_exprs_ctxs, state));
+ return _init_const_arguments();
+}
+
+Status AggFnEvaluator::_init_const_arguments() {
+ const auto& const_argument_indexes =
_function->get_const_argument_indexes();
+ if (const_argument_indexes.empty()) {
+ return Status::OK();
+ }
+
+ _const_argument_idx.assign(_input_exprs_ctxs.size(), false);
+ std::vector<ColumnWithTypeAndName> arguments;
+ arguments.reserve(const_argument_indexes.size());
+ for (const auto i : const_argument_indexes) {
+ if (i >= _input_exprs_ctxs.size()) [[unlikely]] {
Review Comment:
This validation also runs on merge evaluators. In Nereids,
`BUFFER_TO_RESULT`/`BUFFER_TO_BUFFER` aggregate expressions use the
aggregate-buffer child as the current phase argument and set `is_merge_agg`;
the BE side therefore opens a merge evaluator with only the serialized state
slot. `_is_merge` is populated from thrift, but `_init_const_arguments()` still
asks the function for original input const indexes and rejects index `1` for
functions such as `percentile`. That makes the global merge phase fail in
`open()` before it can deserialize/merge the partial state. Please skip
const-argument extraction for `_is_merge` evaluators, or make the contract
phase-aware so only input phases require the original const children.
##########
be/src/exprs/vectorized_agg_fn.cpp:
##########
@@ -274,7 +275,36 @@ Status AggFnEvaluator::prepare(RuntimeState* state, const
RowDescriptor& desc,
}
Status AggFnEvaluator::open(RuntimeState* state) {
- return VExpr::open(_input_exprs_ctxs, state);
+ RETURN_IF_ERROR(VExpr::open(_input_exprs_ctxs, state));
+ return _init_const_arguments();
+}
+
+Status AggFnEvaluator::_init_const_arguments() {
+ const auto& const_argument_indexes =
_function->get_const_argument_indexes();
+ if (const_argument_indexes.empty()) {
+ return Status::OK();
+ }
+
+ _const_argument_idx.assign(_input_exprs_ctxs.size(), false);
+ std::vector<ColumnWithTypeAndName> arguments;
+ arguments.reserve(const_argument_indexes.size());
+ for (const auto i : const_argument_indexes) {
+ if (i >= _input_exprs_ctxs.size()) [[unlikely]] {
+ return Status::InternalError("Aggregate function {} requires
invalid const argument {}",
+ _function->get_name(), i);
+ }
+ _const_argument_idx[i] = true;
+ if (!_input_exprs_ctxs[i]->root()->is_constant()) {
+ return Status::InternalError(
+ "Aggregate function {} requires argument {} to be
constant",
+ _function->get_name(), i);
+ }
+
+ ColumnWithTypeAndName argument;
+ RETURN_IF_ERROR(_input_exprs_ctxs[i]->execute_const_expr(argument));
+ arguments.emplace_back(std::move(argument));
+ }
+ return _function->set_const_arguments(arguments);
Review Comment:
`set_const_arguments()` is mutating the aggregate function object, but
evaluator clones share that object. `AggFnEvaluator::clone()` copies
`_function` as the same `shared_ptr`, and bucketed aggregation creates
per-instance evaluator clones specifically because evaluator state is mutated;
those local states can be initialized/opened concurrently. With this change
each clone calls `open()` and writes cached constants such as `_quantile`,
`_compression`, `_half_decay`, `_level`, or the `ai_agg` string refs into the
same shared function object. Even when every clone writes the same value, that
is an unsynchronized data race and it also means one evaluator's cached state
is not actually evaluator-local. Please make the cached constants immutable
before cloning, deep-clone the aggregate function, or keep per-evaluator owned
const-argument storage.
##########
fe/fe-core/src/main/java/org/apache/doris/nereids/trees/expressions/functions/agg/PercentileArray.java:
##########
@@ -82,10 +83,25 @@ private PercentileArray(AggregateFunctionParams
functionParams) {
@Override
public void checkLegalityBeforeTypeCoercion() {
- if (!getArgument(1).isConstant()) {
+ Expression quantiles = getArgument(1);
+ if (!quantiles.isConstant()) {
throw new AnalysisException(
"percentile_array requires second parameter must be a
constant : " + this.toSql());
}
+ if (quantiles instanceof ArrayLiteral) {
+ ArrayLiteral arrayLiteral = (ArrayLiteral) quantiles;
+ for (org.apache.doris.nereids.trees.expressions.literal.Literal
item : arrayLiteral.getValue()) {
+ if (item instanceof NullLiteral) {
+ throw new AnalysisException(
+ "percentile_array quantile should not be null : "
+ this.toSql());
+ }
+ double value = item.getDouble();
Review Comment:
This has the same pre-coercion `getDouble()` problem as the scalar
percentile checks, but on the `ARRAY<DOUBLE>` argument. `PercentileArray`
implements `ExplicitlyCastableSignature`, so an array literal such as `['0.5',
'0.8']` can be matched by casting its elements to `DOUBLE`. This loop runs
before that coercion and calls `getDouble()` on the original
`StringLikeLiteral` elements; that method byte-encodes the string for ordering
instead of parsing it numerically, so valid numeric strings can be rejected as
out of range. Please validate the array values after element coercion/folding,
or explicitly reject non-numeric element literal types with a controlled
compatibility decision.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]