+1
> On Jun 4, 2026, at 12:57 AM, Alessandro Solimando > <[email protected]> wrote: > > If it's not supported anymore it's already a good enough argument to > replace it. > > The proposed plan looks reasonable to me. > > Alessandro > >> On Thu, Jun 4, 2026, 09:45 jensen <[email protected]> wrote: >> >> I vaguely recall that Arrow's official website also discourages the use of >> Gandiva, and Gandiva is no longer maintained. If possible, I think we >> should remove this dependency. >> >> >> >> Best regards, >> >> Zhen >> >> ---- Replied Message ---- >> | From | Cancai Cai<[email protected]> | >> | Date | 6/4/2026 14:20 | >> | To | <[email protected]> | >> | Subject | Subject: [DISCUSS] Removing Gandiva from the Arrow adapter | >> Hi all, >> >> I would like to discuss the future of Gandiva in Calcite's Arrow adapter. >> My preferred long-term direction is to remove the Gandiva dependency from >> the adapter. >> >> The current adapter uses Arrow Java to read Arrow data, but relies on >> Gandiva `Projector` and `Filter` for projection and filter execution. >> Gandiva is a native LLVM-based runtime, and the Java module is a wrapper >> around that native implementation. As a result, basic Arrow adapter queries >> depend on native libraries, LLVM compatibility, platform packaging, and JDK >> baseline details. >> >> This has become a practical maintenance problem when thinking about Arrow >> dependency upgrades. >> >> The upgrade problem is not limited to Java bytecode compatibility. In our >> experiments, newer Arrow versions failed at different layers. Arrow 18 >> requires a newer Java baseline than Calcite currently supports in its JDK 8 >> jobs. Arrow 17 and 16.1 still use Java 8 class files, but can hit Java >> runtime API incompatibilities on JDK 8, such as `ByteBuffer.flip(): >> ByteBuffer`. Arrow 16.0 avoids that Java runtime issue, but exposed Gandiva >> native / LLVM symbol issues on Linux CI. >> >> This means that as long as `arrow-gandiva` is required for the adapter's >> correctness path, upgrading the Arrow Java vector layer also requires >> validating the native Gandiva stack across all CI platforms. Even when >> `arrow-vector` itself is usable, `arrow-gandiva` can still block the >> upgrade. >> >> For that reason, I think the adapter should make projection/filter >> correctness independent of Gandiva first. Once the Java correctness path is >> in place, Arrow vector upgrades can be evaluated separately from Gandiva >> native compatibility. >> >> The direction I have in mind is a pure Java correctness path for the Arrow >> adapter: >> >> * read Arrow data with `ArrowFileReader`, `VectorSchemaRoot`, and >> `ValueVector`; >> * execute simple projections by reading selected vectors directly; >> * execute the simple filters currently translated by `ArrowTranslator` with >> a Java evaluator; >> * leave expressions that are not pushed into the adapter to Calcite's >> normal Enumerable / code generation path. >> >> With that model, Gandiva would no longer be required for correctness. A >> staged migration could be: >> >> 1. Move no-filter simple projection away from Gandiva. >> 2. Add Java evaluation for the simple filter subset currently supported by >> `ArrowTranslator`. >> 3. Validate that existing Arrow adapter tests pass without invoking >> Gandiva. >> 4. Remove `arrow-gandiva` from the adapter dependency set once the Java >> path covers the current behavior. >> >> The tradeoff is that Gandiva may be faster for supported expressions. But >> for this adapter, I think correctness, portability, and dependency >> stability should come first. If acceleration is needed later, it can be >> discussed separately. >> >> Does this direction make sense to the community? Are there current use >> cases that depend on Gandiva pushdown strongly enough that we should keep >> the native dependency? >> >> Thanks, >> Cancai >>
