andygrove opened a new issue, #4359: URL: https://github.com/apache/datafusion-comet/issues/4359
# Bug triage results: 2026-05-18 Triage pass over the open `requires-triage` queue, per the project [Bug Triage Guide](https://github.com/apache/datafusion-comet/blob/main/docs/source/contributor-guide/bug_triage.md). - Total issues processed: 20 - Labels applied to: 19 - Skipped: 1 - `priority:high`: 2 - `priority:medium`: 10 - `priority:low`: 7 Labels have already been applied and `requires-triage` removed from each issue listed under "Triaged". A reviewer should spot-check the calls and close this issue when satisfied. To correct a label, edit the affected issue directly. ## Triaged ### priority:high - AbstractMethodError: CometBroadcastExchangeExec missing sparkContext() from BroadcastExchangeLike ([#4318](https://github.com/apache/datafusion-comet/issues/4318)) - Area labels: none - Rationale: `AbstractMethodError` thrown on a supported code path (Comet 0.16 + Spark 3.5.6 broadcast joins); per the guide, an unhandled exception on a supported path is `priority:high`. - Windows crash if frame overflow ([#4307](https://github.com/apache/datafusion-comet/issues/4307)) - Area labels: `area:expressions` - Rationale: Native engine throws `CometNativeException` on a supported window-function query (the `18446744073709551615` index points to a u64 underflow in frame computation); a native crash on a supported path is `priority:high`. ### priority:medium - Allocate Comet's parquet reader buffers from ArrowUtils.rootAllocator to enable zero-copy PyArrow UDF runner ([#4294](https://github.com/apache/datafusion-comet/issues/4294)) - Area labels: `area:scan`, `area:ffi` - Rationale: Performance optimization for the columnar Python runner with a working bulk-copy fallback today; matches the guide's "performance regression with workaround" criterion. - [FEATURE] Native scan support for VariantType columns (Iceberg + Spark 4.0) ([#4295](https://github.com/apache/datafusion-comet/issues/4295)) - Area labels: `area:scan`, `native_iceberg_compat`, `spark 4.0` - Rationale: Missing native VariantType support causes whole-query fallback; functional gap with Spark fallback workaround is `priority:medium`. - Implement JVM UDFs for all date/time expressions ([#4311](https://github.com/apache/datafusion-comet/issues/4311)) - Area labels: `area:expressions` - Rationale: Compatibility-feature gap: replace native date/time expressions with JVM UDFs for full Spark parity; functional gap with workaround is `priority:medium`. - Add support for native custom scalar UDFs ([#4312](https://github.com/apache/datafusion-comet/issues/4312)) - Area labels: `area:expressions` - Rationale: New user-facing feature for registering custom UDFs (prototype in PR #4283); missing feature with workaround (use Spark UDFs) is `priority:medium`. - Implement JVM UDFs for JSON expressions ([#4313](https://github.com/apache/datafusion-comet/issues/4313)) - Area labels: `area:expressions` - Rationale: Adds full Spark-compatible JSON expression support via JVM UDFs; missing-feature gap with workaround is `priority:medium`. - Writes to Apache Iceberg Tables ([#4322](https://github.com/apache/datafusion-comet/issues/4322)) - Area labels: `area:writer`, `native_iceberg_compat` - Rationale: New Iceberg write path is a major feature gap with the existing Spark write path as workaround; matches `priority:medium`. - Frequent CI failures for Spark 4.0.2 / JDK 21 ([#4327](https://github.com/apache/datafusion-comet/issues/4327)) - Area labels: `area:ci`, `spark 4.0` - Rationale: A flaky CI test would normally be `priority:low`, but the title says "frequent" failures on the standard Spark 4.0.2 / JDK 21 build; per the guide's escalation rule for CI consistently blocking merges, escalated to `priority:medium`. - Credential Provider Support ([#4332](https://github.com/apache/datafusion-comet/issues/4332)) - Area labels: `area:scan`, `native_iceberg_compat` - Rationale: Missing pluggable credential provider for Iceberg scans (only static creds today); functional gap with workaround is `priority:medium`. - Comet JVM UDF implementations cannot be created in `spark` module ([#4336](https://github.com/apache/datafusion-comet/issues/4336)) - Area labels: `area:expressions` - Rationale: Module / shading structure prevents implementing UDFs that need `spark`-module access; broken feature with workaround (place UDFs in `common`) is `priority:medium`. - Implement TimeType support ([#4288](https://github.com/apache/datafusion-comet/issues/4288)) - Area labels: `area:expressions` (existing: `EPIC`, `spark 4.1`) - Rationale: Issue already carried `priority:medium` from a prior reviewer; this pass added `area:expressions` and removed `requires-triage`. ### priority:low - Make CI run on the contributor forks ([#4289](https://github.com/apache/datafusion-comet/issues/4289)) - Area labels: `area:ci` - Rationale: CI infrastructure rework with no functional impact; matches the guide's `priority:low` "tooling" example. - [DISCUSS] Simplify regex engine + incompatibility config model ([#4310](https://github.com/apache/datafusion-comet/issues/4310)) - Area labels: `area:expressions` - Rationale: Refactor / config-UX discussion with no underlying functional bug; user experience polish is `priority:low`. - Drop support for Spark 3.4 ([#4329](https://github.com/apache/datafusion-comet/issues/4329)) - Area labels: none - Rationale: Project-policy / versioning discussion; tooling-and-process item maps to `priority:low`. - Enable spark.comet.exec.localTableScan.enabled when running Spark SQL tests ([#4347](https://github.com/apache/datafusion-comet/issues/4347)) - Area labels: `spark sql tests` - Rationale: Test-infrastructure tweak so SQL suites exercise more of Comet; test-only / tooling change is `priority:low`. - native_datafusion: tests asserting parquet-mr's permissive overflow/narrowing behavior cannot be made to pass ([#4352](https://github.com/apache/datafusion-comet/issues/4352)) - Area labels: `area:scan`, `spark sql tests` (existing: `native_datafusion`) - Rationale: Architectural test-only mismatch; the workaround is to re-ignore the affected Spark tests. Test-only with workaround is `priority:low`. - native_datafusion (Spark 3.x): shim's ParquetSchemaConvert translation produces an extra SparkException cause-chain layer ([#4354](https://github.com/apache/datafusion-comet/issues/4354)) - Area labels: `area:scan`, `native_datafusion` - Rationale: Behavior difference visible only in Spark SQL test cause-chain assertions; tests stay ignored as a workaround. Test-only failure is `priority:low`. - Change UDF signature to use ColumnarValue rather than raw Arrow types ([#4358](https://github.com/apache/datafusion-comet/issues/4358)) - Area labels: `area:expressions` - Rationale: Internal API refactor with no user-facing functional bug; matches `priority:low` for tooling/internal cleanup. ## Escalations to consider - Frequent CI failures for Spark 4.0.2 / JDK 21 ([#4327](https://github.com/apache/datafusion-comet/issues/4327)) - Escalated from `priority:low` (CI flake) to `priority:medium` per the guide's rule "A `priority:low` CI flake is blocking PR merges consistently → escalate to `priority:medium`". The reviewer should confirm whether these failures are in fact blocking merges; if not, downgrade to `priority:low`. ## Skipped — needs more info - Bug triage results: 2026-05-11 ([#4287](https://github.com/apache/datafusion-comet/issues/4287)) - This is the previous triage's summary issue. It is a meta issue, not a bug or feature request, and per this skill's rules ("Do not add labels to the summary issue itself") it should not carry a priority. The reviewer should close it when finished spot-checking the prior pass; `requires-triage` was left in place since this skill does not modify summary issues. ## Notes on label availability - The triage guide lists `spark 4` as a pre-existing area indicator, but the repo only has versioned labels (`spark 3.x`, `spark 4.0`, `spark 4.1`, `spark 4.2`). Where applicable, the most specific existing version label was used (`spark 4.0` for #4295 and #4327). No new labels were created. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
