Re: JDBC Adapter for Apache-Arrow

Laurent Goujon Tue, 31 Oct 2017 13:38:19 -0700

I really like Julian's idea of unwrapping Arrow objects out of the JDBC
ResultSet, but I wonder if the unwrap class has to be specific to the
driver and if an interface can be designed to be used by multiple drivers:
for drivers based on Arrow, it means you could totally skip the
serialization/deserialization from/to JDBC records.
If such an interface exists, I would propose to add it to the Arrow
project, with Arrow product/projects in charge of adding support for it in
their own JDBC driver.


Laurent

On Tue, Oct 31, 2017 at 1:18 PM, Atul Dambalkar <atul.dambal...@xoriant.com>
wrote:

> Thanks for your thoughts Julian. I think, adding support for Arrow objects
> for Avatica Remote Driver (AvaticaToArrowConverter) can be certainly taken
> up as another activity. And you are right, we will have to look at specific
> JDBC driver to really optimize it individually.
>
> I would be curious if there are any further inputs/comments from other Dev
> folks, on the JDBC adapter aspect.
>
> -Atul
>
> -----Original Message-----
> From: Julian Hyde [mailto:jh...@apache.org]
> Sent: Tuesday, October 31, 2017 11:12 AM
> To: dev@arrow.apache.org
> Subject: Re: JDBC Adapter for Apache-Arrow
>
> Sorry I didn’t read your email thoroughly enough. I was talking about the
> inverse (JDBC reading from Arrow) whereas you are talking about Arrow
> reading from JDBC. Your proposal makes perfect sense.
>
> JDBC is quite a chatty interface (a call for every column of every row,
> plus an occasional call to find out whether values are null, and objects
> such as strings and timestamps become a Java heap object) so for specific
> JDBC drivers it may be possible to optimize. For example, the Avatica
> remove driver receives row sets in an RPC response in protobuf format. It
> may be useful if the JDBC driver were able to expose a direct path from
> protobuf to Arrow. "ResultSet.unwrap(AvaticaToArrowConverter.class)”
> might be one way to achieve this.
>
> Julian
>
>
>
>
> > On Oct 31, 2017, at 10:41 AM, Atul Dambalkar <atul.dambal...@xoriant.com>
> wrote:
> >
> > Hi Julian,
> >
> > Thanks for your response. If I understand correctly (looking at other
> adapters), Calcite-Arrow adapter would provide SQL front end for in-memory
> Arrow data objects/structures. So from that perspective, are you suggesting
> building the Calcite-Arrow adapter?
> >
> > In this case, what we are saying is to provide a mechanism for upstream
> apps to be able to get/create Arrow objects/structures from a relational
> database. This would also mean converting row like data from a SQL Database
> to columnar Arrow data structures. The utility may be, can make use of
> JDBC's MetaData features to figure out the underlying DB schema and define
> Arrow columnar schema. Also underlying database in this case would be any
> relational DB and hence would be persisted to the disk, but the Arrow
> objects being in-memory can be ephemeral.
> >
> > Please correct me if I am missing anything.
> >
> > -Atul
> >
> > -----Original Message-----
> > From: Julian Hyde [mailto:jhyde.apa...@gmail.com]
> > Sent: Monday, October 30, 2017 7:50 PM
> > To: dev@arrow.apache.org
> > Subject: Re: JDBC Adapter for Apache-Arrow
> >
> > How about writing an Arrow adapter for Calcite? I think it amounts to
> the same thing - you would inherit Calcite’s SQL parser and Avatica JDBC
> stack.
> >
> > Would this database be ephemeral (i.e. would the data go away when you
> close the connection)? If not, how would you know where to load the data
> from?
> >
> > Julian
> >
> >> On Oct 30, 2017, at 6:17 PM, Atul Dambalkar <atul.dambal...@xoriant.com>
> wrote:
> >>
> >> Hi all,
> >>
> >> I wanted to open up a conversation here regarding developing a
> Java-based JDBC Adapter for Apache Arrow. I have had a preliminary
> discussion with Wes McKinney and Siddharth Teotia on this a couple weeks
> earlier.
> >>
> >> Basically at a high level (over-simplified) this adapter/API will allow
> upstream apps to query RDBMS data over JDBC and get the JDBC objects
> converted to Arrow in-memory (JVM) objects/structures. The upstream utility
> can then work with Arrow objects/structures with usual performance
> benefits. The utility will be very much similar to C++ implementation of
> "Convert a vector of row-wise data into an Arrow table" as described here -
> https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html.
> >>
> >> How useful this adapter would be and which other Apache projects would
> benefit from this? Based on the usability we can open a JIRA for this
> activity and start looking into the implementation details.
> >>
> >> Regards,
> >> -Atul Dambalkar
> >>
> >>
>
>

Re: JDBC Adapter for Apache-Arrow

Reply via email to