I vaguely recall that Arrow's official website also discourages the use of 
Gandiva, and Gandiva is no longer maintained. If possible, I think we should 
remove this dependency.



Best regards,

Zhen

---- Replied Message ----
| From | Cancai Cai<[email protected]> |
| Date | 6/4/2026 14:20 |
| To | <[email protected]> |
| Subject | Subject: [DISCUSS] Removing Gandiva from the Arrow adapter |
Hi all,

I would like to discuss the future of Gandiva in Calcite's Arrow adapter.
My preferred long-term direction is to remove the Gandiva dependency from
the adapter.

The current adapter uses Arrow Java to read Arrow data, but relies on
Gandiva `Projector` and `Filter` for projection and filter execution.
Gandiva is a native LLVM-based runtime, and the Java module is a wrapper
around that native implementation. As a result, basic Arrow adapter queries
depend on native libraries, LLVM compatibility, platform packaging, and JDK
baseline details.

This has become a practical maintenance problem when thinking about Arrow
dependency upgrades.

The upgrade problem is not limited to Java bytecode compatibility. In our
experiments, newer Arrow versions failed at different layers. Arrow 18
requires a newer Java baseline than Calcite currently supports in its JDK 8
jobs. Arrow 17 and 16.1 still use Java 8 class files, but can hit Java
runtime API incompatibilities on JDK 8, such as `ByteBuffer.flip():
ByteBuffer`. Arrow 16.0 avoids that Java runtime issue, but exposed Gandiva
native / LLVM symbol issues on Linux CI.

This means that as long as `arrow-gandiva` is required for the adapter's
correctness path, upgrading the Arrow Java vector layer also requires
validating the native Gandiva stack across all CI platforms. Even when
`arrow-vector` itself is usable, `arrow-gandiva` can still block the
upgrade.

For that reason, I think the adapter should make projection/filter
correctness independent of Gandiva first. Once the Java correctness path is
in place, Arrow vector upgrades can be evaluated separately from Gandiva
native compatibility.

The direction I have in mind is a pure Java correctness path for the Arrow
adapter:

* read Arrow data with `ArrowFileReader`, `VectorSchemaRoot`, and
`ValueVector`;
* execute simple projections by reading selected vectors directly;
* execute the simple filters currently translated by `ArrowTranslator` with
a Java evaluator;
* leave expressions that are not pushed into the adapter to Calcite's
normal Enumerable / code generation path.

With that model, Gandiva would no longer be required for correctness. A
staged migration could be:

1. Move no-filter simple projection away from Gandiva.
2. Add Java evaluation for the simple filter subset currently supported by
`ArrowTranslator`.
3. Validate that existing Arrow adapter tests pass without invoking Gandiva.
4. Remove `arrow-gandiva` from the adapter dependency set once the Java
path covers the current behavior.

The tradeoff is that Gandiva may be faster for supported expressions. But
for this adapter, I think correctness, portability, and dependency
stability should come first. If acceleration is needed later, it can be
discussed separately.

Does this direction make sense to the community? Are there current use
cases that depend on Gandiva pushdown strongly enough that we should keep
the native dependency?

Thanks,
Cancai

Reply via email to