I really like Julian's idea of unwrapping Arrow objects out of the JDBC ResultSet, but I wonder if the unwrap class has to be specific to the driver and if an interface can be designed to be used by multiple drivers: for drivers based on Arrow, it means you could totally skip the serialization/deserialization from/to JDBC records. If such an interface exists, I would propose to add it to the Arrow project, with Arrow product/projects in charge of adding support for it in their own JDBC driver.
Laurent On Tue, Oct 31, 2017 at 1:18 PM, Atul Dambalkar <atul.dambal...@xoriant.com> wrote: > Thanks for your thoughts Julian. I think, adding support for Arrow objects > for Avatica Remote Driver (AvaticaToArrowConverter) can be certainly taken > up as another activity. And you are right, we will have to look at specific > JDBC driver to really optimize it individually. > > I would be curious if there are any further inputs/comments from other Dev > folks, on the JDBC adapter aspect. > > -Atul > > -----Original Message----- > From: Julian Hyde [mailto:jh...@apache.org] > Sent: Tuesday, October 31, 2017 11:12 AM > To: dev@arrow.apache.org > Subject: Re: JDBC Adapter for Apache-Arrow > > Sorry I didn’t read your email thoroughly enough. I was talking about the > inverse (JDBC reading from Arrow) whereas you are talking about Arrow > reading from JDBC. Your proposal makes perfect sense. > > JDBC is quite a chatty interface (a call for every column of every row, > plus an occasional call to find out whether values are null, and objects > such as strings and timestamps become a Java heap object) so for specific > JDBC drivers it may be possible to optimize. For example, the Avatica > remove driver receives row sets in an RPC response in protobuf format. It > may be useful if the JDBC driver were able to expose a direct path from > protobuf to Arrow. "ResultSet.unwrap(AvaticaToArrowConverter.class)” > might be one way to achieve this. > > Julian > > > > > > On Oct 31, 2017, at 10:41 AM, Atul Dambalkar <atul.dambal...@xoriant.com> > wrote: > > > > Hi Julian, > > > > Thanks for your response. If I understand correctly (looking at other > adapters), Calcite-Arrow adapter would provide SQL front end for in-memory > Arrow data objects/structures. So from that perspective, are you suggesting > building the Calcite-Arrow adapter? > > > > In this case, what we are saying is to provide a mechanism for upstream > apps to be able to get/create Arrow objects/structures from a relational > database. This would also mean converting row like data from a SQL Database > to columnar Arrow data structures. The utility may be, can make use of > JDBC's MetaData features to figure out the underlying DB schema and define > Arrow columnar schema. Also underlying database in this case would be any > relational DB and hence would be persisted to the disk, but the Arrow > objects being in-memory can be ephemeral. > > > > Please correct me if I am missing anything. > > > > -Atul > > > > -----Original Message----- > > From: Julian Hyde [mailto:jhyde.apa...@gmail.com] > > Sent: Monday, October 30, 2017 7:50 PM > > To: dev@arrow.apache.org > > Subject: Re: JDBC Adapter for Apache-Arrow > > > > How about writing an Arrow adapter for Calcite? I think it amounts to > the same thing - you would inherit Calcite’s SQL parser and Avatica JDBC > stack. > > > > Would this database be ephemeral (i.e. would the data go away when you > close the connection)? If not, how would you know where to load the data > from? > > > > Julian > > > >> On Oct 30, 2017, at 6:17 PM, Atul Dambalkar <atul.dambal...@xoriant.com> > wrote: > >> > >> Hi all, > >> > >> I wanted to open up a conversation here regarding developing a > Java-based JDBC Adapter for Apache Arrow. I have had a preliminary > discussion with Wes McKinney and Siddharth Teotia on this a couple weeks > earlier. > >> > >> Basically at a high level (over-simplified) this adapter/API will allow > upstream apps to query RDBMS data over JDBC and get the JDBC objects > converted to Arrow in-memory (JVM) objects/structures. The upstream utility > can then work with Arrow objects/structures with usual performance > benefits. The utility will be very much similar to C++ implementation of > "Convert a vector of row-wise data into an Arrow table" as described here - > https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html. > >> > >> How useful this adapter would be and which other Apache projects would > benefit from this? Based on the usability we can open a JIRA for this > activity and start looking into the implementation details. > >> > >> Regards, > >> -Atul Dambalkar > >> > >> > >