Re: Using Calcite with Python

Gavin Ray Mon, 31 Jan 2022 07:00:44 -0800

This is really interesting stuff you've done in the example notebooks

Nicola & Michael, I wonder if you could benefit from the recently-released
Arrow Flight SQL?
https://www.dremio.com/subsurface/arrow-flight-and-arrow-flight-sql-accelerating-data-movement/


I have asked Jacques about this a bit -- it's meant to be a standardization
for communicating SQL queries and metadata with Arrow.
I'm not intimately familiar with it, but it seems like it could be a good
base to build a Calcite backend for Arrow from?

They have a pretty thorough Java example in the repository:
https://github.com/apache/arrow/blob/968e6ea488c939c0e1f2bfe339a5a9ed1aed603e/java/flight/flight-sql/src/test/java/org/apache/arrow/flight/sql/example/FlightSqlExample.java#L169-L180

On Mon, Jan 31, 2022 at 8:47 AM Michael Mior <[email protected]> wrote:

> You may want to keep an eye on CALCITE-2040 (
> https://issues.apache.org/jira/browse/CALCITE-2040). I have a student who
> is working on a Calcite adapter for Apache Arrow. We're basically hung up
> waiting on the Arrow team to release a compatible JAR. This still won't
> fully solve your problem though as the first version of the adapter is only
> capable of reading from Arrow files. However, the goal is eventually to
> allow passing a memory reference into the adapter so that it would be
> possible to make use of Arrow data which is constructed in-memory
> elsewhere.
> --
> Michael Mior
> [email protected]
>
>
> Le dim. 30 janv. 2022 à 17:36, Nicola Vitucci <[email protected]> a
> écrit :
>
> > Hi all,
> >
> > What would be the best way to use Calcite with Python? I've come up with
> > two potential solutions:
> >
> > - using the jaydebeapi package, to connect via the JDBC driver directly
> > from a JVM created via jpype;
> > - using Apache Arrow via the pyarrow package, to connect in basically the
> > same way but creating Arrow objects with JdbcToArrowUtils (and optionally
> > converting them to Pandas).
> >
> > Although the former is more straightforward, the latter allows to achieve
> > better performance (see [1] for instance) since it's exactly what Arrow
> is
> > for. I've created two Jupyter notebooks [2] showing each solution. What
> > would you recommend? Is there an even better approach?
> >
> > Thanks,
> >
> > Nicola
> >
> > [1] https://uwekorn.com/2020/12/30/fast-jdbc-revisited.html
> > [2]
> https://github.com/nvitucci/calcite-sparql/tree/v0.0.2/examples/python
> >
>

Re: Using Calcite with Python

Reply via email to