Hi Arrow community,
I’ve been working with Apache Arrow in Java and noticed that the *Arrow
Dataset* API is still marked as *experimental* in the documentation (
https://arrow.apache.org/docs/java/dataset.html). I also see that the
latest Maven release for the Java dataset artifact on Maven Central is
*18.3.x* (https://mvnrepository.com/artifact/org.apache.arrow/arrow-dataset
).
A few specific points I’m hoping you could clarify:
1.
*Stability / Production Readiness:*
-
What is the current stability level of the Java Dataset API?
-
Is it recommended for production use in its current state? If not,
what are the known limitations or missing pieces?
2.
*Roadmap / Planned Improvements:*
-
Are there plans to graduate this out of experimental status?
-
Are there specific features or compatibility goals (e.g., with the
C++ dataset API, scanner pushdowns, partitioning, predicate filters,
projection, etc.) planned for 19.x or beyond?
3.
*Maintenance & Compatibility Guarantees:*
-
How does the team intend to maintain API stability going forward?
-
Will future releases maintain backward compatibility for the dataset
API?
4.
*Community Guidance:*
-
For Java users needing dataset-style ingestion/scan/writer workflows
today, are there recommended patterns or extensions (e.g., using Flight,
Parquet + Dataset bindings, or other tools)?
Thanks in advance — any guidance on current limitations, future plans, or
recommended usage patterns would be very helpful.
Best,
*Mayur*