[I] Bug triage results: 2026-05-18 [datafusion-comet]

via GitHub Mon, 18 May 2026 09:10:59 -0700


andygrove opened a new issue, #4359:
URL: https://github.com/apache/datafusion-comet/issues/4359


   # Bug triage results: 2026-05-18
   
   Triage pass over the open `requires-triage` queue, per the project [Bug 
Triage 
Guide](https://github.com/apache/datafusion-comet/blob/main/docs/source/contributor-guide/bug_triage.md).
   
   - Total issues processed: 20
   - Labels applied to: 19
   - Skipped: 1
   - `priority:high`: 2
   - `priority:medium`: 10
   - `priority:low`: 7
   
   Labels have already been applied and `requires-triage` removed from each 
issue listed under "Triaged". A reviewer should spot-check the calls and close 
this issue when satisfied. To correct a label, edit the affected issue directly.
   
   ## Triaged
   
   ### priority:high
   
   - AbstractMethodError: CometBroadcastExchangeExec missing sparkContext() 
from BroadcastExchangeLike 
([#4318](https://github.com/apache/datafusion-comet/issues/4318))
     - Area labels: none
     - Rationale: `AbstractMethodError` thrown on a supported code path (Comet 
0.16 + Spark 3.5.6 broadcast joins); per the guide, an unhandled exception on a 
supported path is `priority:high`.
   - Windows crash if frame overflow 
([#4307](https://github.com/apache/datafusion-comet/issues/4307))
     - Area labels: `area:expressions`
     - Rationale: Native engine throws `CometNativeException` on a supported 
window-function query (the `18446744073709551615` index points to a u64 
underflow in frame computation); a native crash on a supported path is 
`priority:high`.
   
   ### priority:medium
   
   - Allocate Comet's parquet reader buffers from ArrowUtils.rootAllocator to 
enable zero-copy PyArrow UDF runner 
([#4294](https://github.com/apache/datafusion-comet/issues/4294))
     - Area labels: `area:scan`, `area:ffi`
     - Rationale: Performance optimization for the columnar Python runner with 
a working bulk-copy fallback today; matches the guide's "performance regression 
with workaround" criterion.
   - [FEATURE] Native scan support for VariantType columns (Iceberg + Spark 
4.0) ([#4295](https://github.com/apache/datafusion-comet/issues/4295))
     - Area labels: `area:scan`, `native_iceberg_compat`, `spark 4.0`
     - Rationale: Missing native VariantType support causes whole-query 
fallback; functional gap with Spark fallback workaround is `priority:medium`.
   - Implement JVM UDFs for all date/time expressions 
([#4311](https://github.com/apache/datafusion-comet/issues/4311))
     - Area labels: `area:expressions`
     - Rationale: Compatibility-feature gap: replace native date/time 
expressions with JVM UDFs for full Spark parity; functional gap with workaround 
is `priority:medium`.
   - Add support for native custom scalar UDFs 
([#4312](https://github.com/apache/datafusion-comet/issues/4312))
     - Area labels: `area:expressions`
     - Rationale: New user-facing feature for registering custom UDFs 
(prototype in PR #4283); missing feature with workaround (use Spark UDFs) is 
`priority:medium`.
   - Implement JVM UDFs for JSON expressions 
([#4313](https://github.com/apache/datafusion-comet/issues/4313))
     - Area labels: `area:expressions`
     - Rationale: Adds full Spark-compatible JSON expression support via JVM 
UDFs; missing-feature gap with workaround is `priority:medium`.
   - Writes to Apache Iceberg Tables 
([#4322](https://github.com/apache/datafusion-comet/issues/4322))
     - Area labels: `area:writer`, `native_iceberg_compat`
     - Rationale: New Iceberg write path is a major feature gap with the 
existing Spark write path as workaround; matches `priority:medium`.
   - Frequent CI failures for Spark 4.0.2 / JDK 21 
([#4327](https://github.com/apache/datafusion-comet/issues/4327))
     - Area labels: `area:ci`, `spark 4.0`
     - Rationale: A flaky CI test would normally be `priority:low`, but the 
title says "frequent" failures on the standard Spark 4.0.2 / JDK 21 build; per 
the guide's escalation rule for CI consistently blocking merges, escalated to 
`priority:medium`.
   - Credential Provider Support 
([#4332](https://github.com/apache/datafusion-comet/issues/4332))
     - Area labels: `area:scan`, `native_iceberg_compat`
     - Rationale: Missing pluggable credential provider for Iceberg scans (only 
static creds today); functional gap with workaround is `priority:medium`.
   - Comet JVM UDF implementations cannot be created in `spark` module 
([#4336](https://github.com/apache/datafusion-comet/issues/4336))
     - Area labels: `area:expressions`
     - Rationale: Module / shading structure prevents implementing UDFs that 
need `spark`-module access; broken feature with workaround (place UDFs in 
`common`) is `priority:medium`.
   - Implement TimeType support 
([#4288](https://github.com/apache/datafusion-comet/issues/4288))
     - Area labels: `area:expressions` (existing: `EPIC`, `spark 4.1`)
     - Rationale: Issue already carried `priority:medium` from a prior 
reviewer; this pass added `area:expressions` and removed `requires-triage`.
   
   ### priority:low
   
   - Make CI run on the contributor forks 
([#4289](https://github.com/apache/datafusion-comet/issues/4289))
     - Area labels: `area:ci`
     - Rationale: CI infrastructure rework with no functional impact; matches 
the guide's `priority:low` "tooling" example.
   - [DISCUSS] Simplify regex engine + incompatibility config model 
([#4310](https://github.com/apache/datafusion-comet/issues/4310))
     - Area labels: `area:expressions`
     - Rationale: Refactor / config-UX discussion with no underlying functional 
bug; user experience polish is `priority:low`.
   - Drop support for Spark 3.4 
([#4329](https://github.com/apache/datafusion-comet/issues/4329))
     - Area labels: none
     - Rationale: Project-policy / versioning discussion; tooling-and-process 
item maps to `priority:low`.
   - Enable spark.comet.exec.localTableScan.enabled when running Spark SQL 
tests ([#4347](https://github.com/apache/datafusion-comet/issues/4347))
     - Area labels: `spark sql tests`
     - Rationale: Test-infrastructure tweak so SQL suites exercise more of 
Comet; test-only / tooling change is `priority:low`.
   - native_datafusion: tests asserting parquet-mr's permissive 
overflow/narrowing behavior cannot be made to pass 
([#4352](https://github.com/apache/datafusion-comet/issues/4352))
     - Area labels: `area:scan`, `spark sql tests` (existing: 
`native_datafusion`)
     - Rationale: Architectural test-only mismatch; the workaround is to 
re-ignore the affected Spark tests. Test-only with workaround is `priority:low`.
   - native_datafusion (Spark 3.x): shim's ParquetSchemaConvert translation 
produces an extra SparkException cause-chain layer 
([#4354](https://github.com/apache/datafusion-comet/issues/4354))
     - Area labels: `area:scan`, `native_datafusion`
     - Rationale: Behavior difference visible only in Spark SQL test 
cause-chain assertions; tests stay ignored as a workaround. Test-only failure 
is `priority:low`.
   - Change UDF signature to use ColumnarValue rather than raw Arrow types 
([#4358](https://github.com/apache/datafusion-comet/issues/4358))
     - Area labels: `area:expressions`
     - Rationale: Internal API refactor with no user-facing functional bug; 
matches `priority:low` for tooling/internal cleanup.
   
   ## Escalations to consider
   
   - Frequent CI failures for Spark 4.0.2 / JDK 21 
([#4327](https://github.com/apache/datafusion-comet/issues/4327))
     - Escalated from `priority:low` (CI flake) to `priority:medium` per the 
guide's rule "A `priority:low` CI flake is blocking PR merges consistently → 
escalate to `priority:medium`". The reviewer should confirm whether these 
failures are in fact blocking merges; if not, downgrade to `priority:low`.
   
   ## Skipped — needs more info
   
   - Bug triage results: 2026-05-11 
([#4287](https://github.com/apache/datafusion-comet/issues/4287))
     - This is the previous triage's summary issue. It is a meta issue, not a 
bug or feature request, and per this skill's rules ("Do not add labels to the 
summary issue itself") it should not carry a priority. The reviewer should 
close it when finished spot-checking the prior pass; `requires-triage` was left 
in place since this skill does not modify summary issues.
   
   ## Notes on label availability
   
   - The triage guide lists `spark 4` as a pre-existing area indicator, but the 
repo only has versioned labels (`spark 3.x`, `spark 4.0`, `spark 4.1`, `spark 
4.2`). Where applicable, the most specific existing version label was used 
(`spark 4.0` for #4295 and #4327). No new labels were created.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] Bug triage results: 2026-05-18 [datafusion-comet]

Reply via email to