[ 
https://issues.apache.org/jira/browse/ARROW-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16012462#comment-16012462
 ] 

Anthony Fox commented on ARROW-1036:
------------------------------------

This is quite similar to a project the GeoMesa team has been working on.

The GeoMesa project has started to put together a SQL-like API over Arrow files 
in javascript for in-browser querying and visualization.  We have defined a 
class called {{ArrowDataSet}} that wraps an Arrow file and exposes query and 
countBy/groupBy methods.  The queries are defined using a set of simple 
predicate expressions ({{And}}, {{Or}}, {{Equals}}, {{LTEquals}},{{During}}, 
etc etc) with the idea of adding spatial predicates eventually ({{Contains}}, 
{{Intersects}}, {{Overlaps}}).  The query is received by the {{ArrowDataSet}} 
and a query execution plan is produced.  The query execution plan has the usual 
operators ({{Scan}}, {{Filter}}, {{Project}}, {{HashGroupBy}}) as well as 
optimized Filters for dictionary encoded values.  We are also planning on 
having a primary sort key that is hinted through the Arrow column meta-data and 
appropriate optimizations with additional operators like 
{{PrimarySortKeyScan}}.  This will help with seeks when there's a predicate on 
the primary sort key.  For instance, if the primary sort key is {{date}} and 
there's a query predicate using {{During}} on a start and end date, then the 
execution plan will use {{PrimarySortKeyScan}} to efficiently skip batches till 
it reaches records that pass the predicate.

I'd be interested in how this can be standardized across the languages 
supported by Arrow.

> [C++] Define abstract API for filtering Arrow streams (e.g. predicate 
> evaluation)
> ---------------------------------------------------------------------------------
>
>                 Key: ARROW-1036
>                 URL: https://issues.apache.org/jira/browse/ARROW-1036
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: C++
>            Reporter: Wes McKinney
>
> It would be useful to be able to apply analytic predicates to an Arrow stream 
> in a composable way. As soon as we are able to compute some simple predicates 
> on in-memory Arrow data, we could define our first version of this



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to