[ https://issues.apache.org/jira/browse/CALCITE-2040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17318935#comment-17318935 ]
Julian Hyde commented on CALCITE-2040: -------------------------------------- [~mmior] and Karshit, Can you tell me how you created the {{test.arrow}} file? Before we merge to master, I want to remove this binary file from source control. If you could provide a java program that generates {{test.arrow}}, I could take it from there. (I will probably hook it into the gradle build scripts, so that test resources get generated. I will probably also generate Arrow files that contain the scott data set (EMP, DEPT, etc.).) > Create adapter for Apache Arrow > ------------------------------- > > Key: CALCITE-2040 > URL: https://issues.apache.org/jira/browse/CALCITE-2040 > Project: Calcite > Issue Type: Bug > Reporter: Julian Hyde > Assignee: Julian Hyde > Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > Create an adapter for [Apache Arrow|http://arrow.apache.org/]. This would > allow people to execute SQL statements, via JDBC or ODBC, on data stored in > Arrow in-memory format. > Since Arrow is an in-memory format, it is not as straightforward as reading, > say, CSV files using the file adapter: an Arrow data set does not have a URL. > (Unless we use Arrow's > [Feather|https://blog.cloudera.com/blog/2016/03/feather-a-fast-on-disk-format-for-data-frames-for-r-and-python-powered-by-apache-arrow/] > format, or use an in-memory file system such as Alluxio.) So we would need > to devise a way of addressing Arrow data sets. > Also, since Arrow is an extremely efficient format for processing data, it > would also be good to have Arrow as a calling convention. That is, > implementations of relational operators such as Filter, Project, Aggregate in > addition to just TableScan. > Lastly, when we have an Arrow convention, if we build adapters for file > formats (for instance the bioinformatics formats SAM, VCF, FASTQ discussed in > CALCITE-2025) it would make a lot of sense to translate those formats > directly into Arrow (applying simple projects and filters first if > applicable). Those adapters would belong as a "contrib" module in the Arrow > project better than in Calcite. -- This message was sent by Atlassian Jira (v8.3.4#803005)