[jira] [Created] (CALCITE-2040) Create adapter for Apache Arrow

Julian Hyde (JIRA) Tue, 07 Nov 2017 10:55:18 -0800

Julian Hyde created CALCITE-2040:
------------------------------------

             Summary: Create adapter for Apache Arrow
                 Key: CALCITE-2040
                 URL: https://issues.apache.org/jira/browse/CALCITE-2040
             Project: Calcite
          Issue Type: Bug
            Reporter: Julian Hyde
            Assignee: Julian Hyde



Create an adapter for [Apache Arrow|http://arrow.apache.org/]. This would allow 
people to execute SQL statements, via JDBC or ODBC, on data stored in Arrow 
in-memory format.

Since Arrow is an in-memory format, it is not as straightforward as reading, 
say, CSV files using the file adapter: an Arrow data set does not have a URL. 
(Unless we use Arrow's 
[Feather|https://blog.cloudera.com/blog/2016/03/feather-a-fast-on-disk-format-for-data-frames-for-r-and-python-powered-by-apache-arrow/]
 format, or use an in-memory file system such as Alluxio.) So we would need to 
devise a way of addressing Arrow data sets.

Also, since Arrow is an extremely efficient format for processing data, it 
would also be good to have Arrow as a calling convention. That is, 
implementations of relational operators such as Filter, Project, Aggregate in 
addition to just TableScan.

Lastly, when we have an Arrow convention, if we build adapters for file formats 
(for instance the bioinformatics formats SAM, VCF, FASTQ discussed in 
CALCITE-2025) it would make a lot of sense to translate those formats directly 
into Arrow (applying simple projects and filters first if applicable). Those 
adapters would belong as a "contrib" module in the Arrow project better than in 
Calcite.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (CALCITE-2040) Create adapter for Apache Arrow

Reply via email to