andygrove opened a new issue, #4548:
URL: https://github.com/apache/datafusion-comet/issues/4548

   Triage pass for issues labeled `requires-triage`.
   
   - **Date:** 2026-06-01
   - **Issues processed:** 48 (42 triaged, 6 skipped, 0 failed)
   - **Priority counts applied:** `priority:critical` 11, `priority:high` 5, 
`priority:medium` 19, `priority:low` 7
   - **Guide:** 
[docs/source/contributor-guide/bug_triage.md](https://github.com/apache/datafusion-comet/blob/main/docs/source/contributor-guide/bug_triage.md)
   
   Labels have already been applied and `requires-triage` removed from the 
triaged issues. Please spot-check the calls below and close this issue when 
satisfied. Correct any label directly on the affected issue.
   
   ## Triaged
   
   ### priority:critical
   
   - JVM codegen dispatcher miscompiles map-typed (MapType) output 
([#4539](https://github.com/apache/datafusion-comet/issues/4539))
     - Area labels: `area:expressions`, `area:ffi`
     - Rationale: silent wrong result (map key corrupted, runs natively with no 
fallback); decision-tree step 1.
   - [Bug] replace returns wrong result for empty-string search 
([#4497](https://github.com/apache/datafusion-comet/issues/4497))
     - Area labels: `area:expressions`
     - Rationale: silent wrong result vs Spark for empty search string.
   - [Bug] CAST(complex AS STRING) does not honour 
spark.sql.legacy.castComplexTypesToString.enabled 
([#4492](https://github.com/apache/datafusion-comet/issues/4492))
     - Area labels: `area:expressions`
     - Rationale: ignores a config and produces wrong cast output (guide lists 
config-ignoring as critical).
   - [Bug] array_max and array_min disagree with Spark on NaN ordering 
([#4482](https://github.com/apache/datafusion-comet/issues/4482))
     - Area labels: `area:expressions`
     - Rationale: silent wrong result for NaN-containing arrays.
   - [Bug] array_distinct / array_union / array_except do not canonicalize NaN 
like Spark ([#4481](https://github.com/apache/datafusion-comet/issues/4481))
     - Area labels: `area:expressions`
     - Rationale: silent wrong result for NaN / signed-zero elements.
   - [Bug] str_to_map does not honour Spark 4.1.1 
legacy.truncateForEmptyRegexSplit 
([#4477](https://github.com/apache/datafusion-comet/issues/4477))
     - Area labels: `area:expressions`
     - Rationale: ignores a Spark 4.1.1 config, silently diverging when it is 
set.
   - [Bug] decode ignores Spark 4.0 legacyCharsets and legacyErrorAction flags 
([#4465](https://github.com/apache/datafusion-comet/issues/4465))
     - Area labels: `area:expressions`
     - Rationale: returns NULL where Spark substitutes or raises, a silent 
divergence in default and legacy modes.
   - [Bug] translate uses graphemes vs Spark code points and ignores U+0000 
deletion ([#4463](https://github.com/apache/datafusion-comet/issues/4463))
     - Area labels: `area:expressions`
     - Rationale: silent wrong result for combining marks and NUL-deletion 
semantics.
   - [Bug] make_date does not throw under spark.sql.ansi.enabled=true 
([#4451](https://github.com/apache/datafusion-comet/issues/4451))
     - Area labels: `area:expressions`
     - Rationale: returns NULL instead of the Spark ANSI error, a silent 
divergence when ANSI is on.
   - [Bug] next_day trims whitespace from dayOfWeek; Spark does not 
([#4450](https://github.com/apache/datafusion-comet/issues/4450))
     - Area labels: `area:expressions`
     - Rationale: returns a date where Spark returns NULL, an unconditional 
silent wrong result.
   - [Bug] next_day does not throw under spark.sql.ansi.enabled=true 
([#4449](https://github.com/apache/datafusion-comet/issues/4449))
     - Area labels: `area:expressions`
     - Rationale: returns NULL instead of the Spark ANSI error, a silent 
divergence when ANSI is on.
   
   ### priority:high
   
   - CreateArray with nullability-divergent children panics in native 
make_array ([#4528](https://github.com/apache/datafusion-comet/issues/4528))
     - Area labels: `area:expressions`
     - Rationale: native panic (assertion failure in make_array); decision-tree 
step 2.
   - ConstantColumnVector inputs fail Comet export with "Comet execution only 
takes Arrow Arrays" 
([#4527](https://github.com/apache/datafusion-comet/issues/4527))
     - Area labels: `area:ffi`
     - Rationale: unhandled exception on a supported path (partition / constant 
columns, e.g. OPTIMIZE).
   - native shuffle: get_string should not panic on non-UTF-8 bytes (use lossy 
decode) ([#4521](https://github.com/apache/datafusion-comet/issues/4521))
     - Area labels: `area:shuffle`
     - Rationale: native panic in shuffle on non-UTF-8 string bytes.
   - CometScanRule: decline native V1 scans on object_store-unsupported 
filesystem schemes 
([#4520](https://github.com/apache/datafusion-comet/issues/4520))
     - Area labels: `area:scan`
     - Rationale: native scan crashes at execution on custom filesystem schemes 
instead of falling back.
   - [Bug] CAST(BinaryType AS StringType) uses unsafe from_utf8_unchecked 
(undefined behaviour) 
([#4488](https://github.com/apache/datafusion-comet/issues/4488))
     - Area labels: `area:expressions`
     - Rationale: Rust undefined behaviour / memory-safety risk on the cast 
path (see escalation note).
   
   ### priority:medium
   
   - Native scan file-read failures should surface as Spark's 
FAILED_READ_FILE.NO_HINT 
([#4529](https://github.com/apache/datafusion-comet/issues/4529))
     - Area labels: `area:scan`
     - Rationale: error-compatibility gap (raw native message and missing path) 
with a fallback workaround.
   - Deep AND/OR predicate chains overflow protobuf recursion limit when the 
serialized plan is re-parsed 
([#4526](https://github.com/apache/datafusion-comet/issues/4526))
     - Area labels: `area:expressions`
     - Rationale: query fails on deep chains, but the trigger (>100 operands) 
is uncommon and degrades to a clean error.
   - Revert transition-heavy stages to Spark row-based execution 
([#4518](https://github.com/apache/datafusion-comet/issues/4518))
     - Area labels: none
     - Rationale: performance optimization for stages that accumulate many 
C2R/R2C transitions.
   - Native divide-by-zero in a dispatched ScalaUDF surfaces 
CometNativeException instead of SparkArithmeticException 
([#4517](https://github.com/apache/datafusion-comet/issues/4517))
     - Area labels: `area:expressions`
     - Rationale: wrong exception class under ANSI (errors either way, only the 
surface differs).
   - CometProject and CometHashAggregate do not perform cross-sibling 
subexpression elimination over ScalaUDF 
([#4516](https://github.com/apache/datafusion-comet/issues/4516))
     - Area labels: `area:expressions`, `area:aggregation`
     - Rationale: result correct but UDF invoked N times instead of once, a 
performance gap for UDF-heavy queries.
   - DataFusion / DataFusion-Spark functions whose Arrow return type drifts 
from Spark catalyst's declared type 
([#4515](https://github.com/apache/datafusion-comet/issues/4515))
     - Area labels: `area:ffi`, `area:expressions`
     - Rationale: latent type-drift (masked by FFI re-stamping today) that 
errors when FFI hops are reduced.
   - map expression audit follow-ups (from #4478) 
([#4505](https://github.com/apache/datafusion-comet/issues/4505))
     - Area labels: `area:expressions`
     - Rationale: deferred audit follow-up tracker, mostly support-level / 
serde consistency work (see escalation note).
   - collection expression audit follow-ups (from #4473) 
([#4504](https://github.com/apache/datafusion-comet/issues/4504))
     - Area labels: `area:expressions`
     - Rationale: deferred audit follow-up tracker, mostly support-level / 
serde consistency work (see escalation note).
   - array expression audit follow-ups (from #4483) 
([#4503](https://github.com/apache/datafusion-comet/issues/4503))
     - Area labels: `area:expressions`
     - Rationale: deferred audit follow-up tracker, mostly support-level / 
serde consistency work (see escalation note).
   - date/time expression audit follow-ups (from #4448) 
([#4502](https://github.com/apache/datafusion-comet/issues/4502))
     - Area labels: `area:expressions`
     - Rationale: deferred audit follow-up tracker, mostly support-level / 
serde consistency work (see escalation note).
   - cast expression audit follow-ups (from #4493) 
([#4501](https://github.com/apache/datafusion-comet/issues/4501))
     - Area labels: `area:expressions`
     - Rationale: deferred audit follow-up tracker, mostly support-level / 
serde consistency work (see escalation note).
   - Math expression audit follow-ups (from #4486) 
([#4500](https://github.com/apache/datafusion-comet/issues/4500))
     - Area labels: `area:expressions`
     - Rationale: deferred audit follow-up tracker, mostly support-level / 
serde consistency work (see escalation note).
   - [Feature] CAST(MapType AS MapType) falls back even though native 
cast_map_to_map exists 
([#4491](https://github.com/apache/datafusion-comet/issues/4491))
     - Area labels: `area:expressions`
     - Rationale: missing cast support, falls back to Spark (correct but 
unaccelerated).
   - [Bug] try_mod falls back to Spark because CometRemainder rejects 
EvalMode.TRY ([#4484](https://github.com/apache/datafusion-comet/issues/4484))
     - Area labels: `area:expressions`
     - Rationale: feature gap, falls back to Spark; result correct via fallback.
   - [Feature] support size() for MapType inputs 
([#4472](https://github.com/apache/datafusion-comet/issues/4472))
     - Area labels: `area:expressions`
     - Rationale: missing expression support with a Spark fallback.
   - [Feature] support concat() for BinaryType and ArrayType inputs 
([#4471](https://github.com/apache/datafusion-comet/issues/4471))
     - Area labels: `area:expressions`
     - Rationale: missing expression support with a Spark fallback.
   - [Bug] CometCaseConversionBase gates compat inside convert() instead of 
getSupportLevel 
([#4467](https://github.com/apache/datafusion-comet/issues/4467))
     - Area labels: `area:expressions`
     - Rationale: the allowIncompatible config is bypassed for upper/lower, a 
functional config bug.
   - [Bug] bit_length and octet_length error natively for BinaryType input 
instead of falling back 
([#4464](https://github.com/apache/datafusion-comet/issues/4464))
     - Area labels: `area:expressions`
     - Rationale: native execution error on binary input instead of a clean 
fallback; uncommon input, workaround exists.
   - Bound CometS3CredentialDispatcher cache via refcounted handle lifecycle 
([#4456](https://github.com/apache/datafusion-comet/issues/4456))
     - Area labels: `area:scan`
     - Rationale: unbounded cache growth on long-running JVMs (eventual OOM), a 
conditional degradation.
   
   ### priority:low
   
   - CI lint check passed, but then later jobs failed with lint errors 
([#4545](https://github.com/apache/datafusion-comet/issues/4545))
     - Area labels: `area:ci`
     - Rationale: CI/tooling lint inconsistency (see escalation note).
   - PlanDataInjector does N x M canInject calls per operator tree 
([#4530](https://github.com/apache/datafusion-comet/issues/4530))
     - Area labels: none
     - Rationale: minor micro-optimization, explicitly no behavior change.
   - Do another audit sweep for string collation differences 
([#4496](https://github.com/apache/datafusion-comet/issues/4496))
     - Area labels: `area:expressions`
     - Rationale: process / tooling task (audit sweep), no concrete defect 
identified.
   - [Doc] CAST has no explicit TimeType branch (Spark 4.1) 
([#4490](https://github.com/apache/datafusion-comet/issues/4490))
     - Area labels: `area:expressions`
     - Rationale: documentation / support-level gap; the fallback itself is 
correct.
   - [Doc] CAST collated-string handling on Spark 4.0+ is implicit and untested 
([#4489](https://github.com/apache/datafusion-comet/issues/4489))
     - Area labels: `area:expressions`
     - Rationale: documentation / test gap; current fallback behavior is 
correct.
   - [Bug] width_bucket bypasses CometExpressionSerde framework 
([#4485](https://github.com/apache/datafusion-comet/issues/4485))
     - Area labels: `area:expressions`
     - Rationale: serde-framework consistency refactor; no wrong result or 
crash.
   - [Doc] decode does not appear in auto-generated compatibility docs 
([#4466](https://github.com/apache/datafusion-comet/issues/4466))
     - Area labels: `area:expressions`
     - Rationale: documentation gap (decode wired via shim, not a serde).
   
   ## Escalations to consider
   
   - [Bug] CAST(BinaryType AS StringType) uses unsafe from_utf8_unchecked 
(undefined behaviour) 
([#4488](https://github.com/apache/datafusion-comet/issues/4488))
     - Labeled `priority:high` for memory safety. Per the guide's "high crash 
that also produces wrong results silently" trigger, undefined behaviour that 
could silently corrupt output may warrant `priority:critical`.
   - CI lint check passed, but then later jobs failed with lint errors 
([#4545](https://github.com/apache/datafusion-comet/issues/4545))
     - Labeled `priority:low`. Per the guide, a CI issue that consistently 
blocks PR merges should escalate to `priority:medium`.
   - Audit follow-up trackers 
([#4505](https://github.com/apache/datafusion-comet/issues/4505), 
[#4504](https://github.com/apache/datafusion-comet/issues/4504), 
[#4503](https://github.com/apache/datafusion-comet/issues/4503), 
[#4502](https://github.com/apache/datafusion-comet/issues/4502), 
[#4501](https://github.com/apache/datafusion-comet/issues/4501), 
[#4500](https://github.com/apache/datafusion-comet/issues/4500))
     - Each bundles many sub-items of mixed severity, including Spark 4.0+ 
non-default-collation correctness gaps that silently diverge. Labeled 
`priority:medium` as trackers; the reviewer may want to split the collation 
sub-items into standalone `priority:critical` issues.
   
   ## Skipped — needs more info
   
   - [EPIC] Support Spark interval types (CalendarInterval / YearMonthInterval 
/ DayTimeInterval) and interval expressions 
([#4540](https://github.com/apache/datafusion-comet/issues/4540))
     - Open-ended EPIC umbrella; a single priority is a roadmap decision rather 
than a mechanical triage call.
   - [EPIC] Provide JVM/codegen-dispatch implementations for Incompatible 
expressions so they never fall back by default 
([#4506](https://github.com/apache/datafusion-comet/issues/4506))
     - Open-ended EPIC umbrella; a single priority is a roadmap decision rather 
than a mechanical triage call.
   - Discussion: Should Comet add geospatial (ST_*) function support? 
([#4455](https://github.com/apache/datafusion-comet/issues/4455))
     - Discussion / scope question needing community and maintainer input, not 
a triageable defect.
   - Bug triage results: 2026-05-26 
([#4441](https://github.com/apache/datafusion-comet/issues/4441))
     - Prior triage summary issue (auto-labeled `requires-triage`); meta, 
awaiting human review and closure, not a bug.
   - Bug triage results: 2026-05-18 
([#4359](https://github.com/apache/datafusion-comet/issues/4359))
     - Prior triage summary issue (auto-labeled `requires-triage`); meta, 
awaiting human review and closure, not a bug.
   - Bug triage results: 2026-05-11 
([#4287](https://github.com/apache/datafusion-comet/issues/4287))
     - Prior triage summary issue (auto-labeled `requires-triage`); meta, 
awaiting human review and closure, not a bug.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to