Current status and roadmap for Java Dataset API

Mayur Thu, 08 Jan 2026 10:14:46 -0800

Hi Arrow community,

I’ve been working with Apache Arrow in Java and noticed that the *Arrow
Dataset* API is still marked as *experimental* in the documentation (
https://arrow.apache.org/docs/java/dataset.html). I also see that the
latest Maven release for the Java dataset artifact on Maven Central is
*18.3.x* (https://mvnrepository.com/artifact/org.apache.arrow/arrow-dataset
).


A few specific points I’m hoping you could clarify:

   1.

   *Stability / Production Readiness:*
   -

      What is the current stability level of the Java Dataset API?
      -

      Is it recommended for production use in its current state? If not,
      what are the known limitations or missing pieces?
      2.

   *Roadmap / Planned Improvements:*
   -

      Are there plans to graduate this out of experimental status?
      -

      Are there specific features or compatibility goals (e.g., with the
      C++ dataset API, scanner pushdowns, partitioning, predicate filters,
      projection, etc.) planned for 19.x or beyond?
      3.

   *Maintenance & Compatibility Guarantees:*
   -

      How does the team intend to maintain API stability going forward?
      -

      Will future releases maintain backward compatibility for the dataset
      API?
      4.

   *Community Guidance:*
   -

      For Java users needing dataset-style ingestion/scan/writer workflows
      today, are there recommended patterns or extensions (e.g., using Flight,
      Parquet + Dataset bindings, or other tools)?

Thanks in advance — any guidance on current limitations, future plans, or
recommended usage patterns would be very helpful.

Best,
*Mayur*

Current status and roadmap for Java Dataset API

Reply via email to