This is really interesting stuff you've done in the example notebooks Nicola & Michael, I wonder if you could benefit from the recently-released Arrow Flight SQL? https://www.dremio.com/subsurface/arrow-flight-and-arrow-flight-sql-accelerating-data-movement/
I have asked Jacques about this a bit -- it's meant to be a standardization for communicating SQL queries and metadata with Arrow. I'm not intimately familiar with it, but it seems like it could be a good base to build a Calcite backend for Arrow from? They have a pretty thorough Java example in the repository: https://github.com/apache/arrow/blob/968e6ea488c939c0e1f2bfe339a5a9ed1aed603e/java/flight/flight-sql/src/test/java/org/apache/arrow/flight/sql/example/FlightSqlExample.java#L169-L180 On Mon, Jan 31, 2022 at 8:47 AM Michael Mior <[email protected]> wrote: > You may want to keep an eye on CALCITE-2040 ( > https://issues.apache.org/jira/browse/CALCITE-2040). I have a student who > is working on a Calcite adapter for Apache Arrow. We're basically hung up > waiting on the Arrow team to release a compatible JAR. This still won't > fully solve your problem though as the first version of the adapter is only > capable of reading from Arrow files. However, the goal is eventually to > allow passing a memory reference into the adapter so that it would be > possible to make use of Arrow data which is constructed in-memory > elsewhere. > -- > Michael Mior > [email protected] > > > Le dim. 30 janv. 2022 à 17:36, Nicola Vitucci <[email protected]> a > écrit : > > > Hi all, > > > > What would be the best way to use Calcite with Python? I've come up with > > two potential solutions: > > > > - using the jaydebeapi package, to connect via the JDBC driver directly > > from a JVM created via jpype; > > - using Apache Arrow via the pyarrow package, to connect in basically the > > same way but creating Arrow objects with JdbcToArrowUtils (and optionally > > converting them to Pandas). > > > > Although the former is more straightforward, the latter allows to achieve > > better performance (see [1] for instance) since it's exactly what Arrow > is > > for. I've created two Jupyter notebooks [2] showing each solution. What > > would you recommend? Is there an even better approach? > > > > Thanks, > > > > Nicola > > > > [1] https://uwekorn.com/2020/12/30/fast-jdbc-revisited.html > > [2] > https://github.com/nvitucci/calcite-sparql/tree/v0.0.2/examples/python > > >
