Re: JDBC Adapter for Apache-Arrow

Julian Hyde Tue, 31 Oct 2017 11:12:56 -0700

Sorry I didn’t read your email thoroughly enough. I was talking about the 
inverse (JDBC reading from Arrow) whereas you are talking about Arrow reading 
from JDBC. Your proposal makes perfect sense.


JDBC is quite a chatty interface (a call for every column of every row, plus an 
occasional call to find out whether values are null, and objects such as 
strings and timestamps become a Java heap object) so for specific JDBC drivers 
it may be possible to optimize. For example, the Avatica remove driver receives 
row sets in an RPC response in protobuf format. It may be useful if the JDBC 
driver were able to expose a direct path from protobuf to Arrow. 
"ResultSet.unwrap(AvaticaToArrowConverter.class)” might be one way to achieve 
this.

Julian




> On Oct 31, 2017, at 10:41 AM, Atul Dambalkar <[email protected]> 
> wrote:
> 
> Hi Julian,
> 
> Thanks for your response. If I understand correctly (looking at other 
> adapters), Calcite-Arrow adapter would provide SQL front end for in-memory 
> Arrow data objects/structures. So from that perspective, are you suggesting 
> building the Calcite-Arrow adapter? 
> 
> In this case, what we are saying is to provide a mechanism for upstream apps 
> to be able to get/create Arrow objects/structures from a relational database. 
> This would also mean converting row like data from a SQL Database to columnar 
> Arrow data structures. The utility may be, can make use of JDBC's MetaData 
> features to figure out the underlying DB schema and define Arrow columnar 
> schema. Also underlying database in this case would be any relational DB and 
> hence would be persisted to the disk, but the Arrow objects being in-memory 
> can be ephemeral. 
> 
> Please correct me if I am missing anything. 
> 
> -Atul
> 
> -----Original Message-----
> From: Julian Hyde [mailto:[email protected]] 
> Sent: Monday, October 30, 2017 7:50 PM
> To: [email protected]
> Subject: Re: JDBC Adapter for Apache-Arrow
> 
> How about writing an Arrow adapter for Calcite? I think it amounts to the 
> same thing - you would inherit Calcite’s SQL parser and Avatica JDBC stack. 
> 
> Would this database be ephemeral (i.e. would the data go away when you close 
> the connection)? If not, how would you know where to load the data from?
> 
> Julian
> 
>> On Oct 30, 2017, at 6:17 PM, Atul Dambalkar <[email protected]> 
>> wrote:
>> 
>> Hi all,
>> 
>> I wanted to open up a conversation here regarding developing a Java-based 
>> JDBC Adapter for Apache Arrow. I have had a preliminary discussion with Wes 
>> McKinney and Siddharth Teotia on this a couple weeks earlier.
>> 
>> Basically at a high level (over-simplified) this adapter/API will allow 
>> upstream apps to query RDBMS data over JDBC and get the JDBC objects 
>> converted to Arrow in-memory (JVM) objects/structures. The upstream utility 
>> can then work with Arrow objects/structures with usual performance benefits. 
>> The utility will be very much similar to C++ implementation of "Convert a 
>> vector of row-wise data into an Arrow table" as described here - 
>> https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html.
>> 
>> How useful this adapter would be and which other Apache projects would 
>> benefit from this? Based on the usability we can open a JIRA for this 
>> activity and start looking into the implementation details.
>> 
>> Regards,
>> -Atul Dambalkar
>> 
>>

Re: JDBC Adapter for Apache-Arrow

Reply via email to