[
https://issues.apache.org/jira/browse/ARROW-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16012462#comment-16012462
]
Anthony Fox commented on ARROW-1036:
------------------------------------
This is quite similar to a project the GeoMesa team has been working on.
The GeoMesa project has started to put together a SQL-like API over Arrow files
in javascript for in-browser querying and visualization. We have defined a
class called {{ArrowDataSet}} that wraps an Arrow file and exposes query and
countBy/groupBy methods. The queries are defined using a set of simple
predicate expressions ({{And}}, {{Or}}, {{Equals}}, {{LTEquals}},{{During}},
etc etc) with the idea of adding spatial predicates eventually ({{Contains}},
{{Intersects}}, {{Overlaps}}). The query is received by the {{ArrowDataSet}}
and a query execution plan is produced. The query execution plan has the usual
operators ({{Scan}}, {{Filter}}, {{Project}}, {{HashGroupBy}}) as well as
optimized Filters for dictionary encoded values. We are also planning on
having a primary sort key that is hinted through the Arrow column meta-data and
appropriate optimizations with additional operators like
{{PrimarySortKeyScan}}. This will help with seeks when there's a predicate on
the primary sort key. For instance, if the primary sort key is {{date}} and
there's a query predicate using {{During}} on a start and end date, then the
execution plan will use {{PrimarySortKeyScan}} to efficiently skip batches till
it reaches records that pass the predicate.
I'd be interested in how this can be standardized across the languages
supported by Arrow.
> [C++] Define abstract API for filtering Arrow streams (e.g. predicate
> evaluation)
> ---------------------------------------------------------------------------------
>
> Key: ARROW-1036
> URL: https://issues.apache.org/jira/browse/ARROW-1036
> Project: Apache Arrow
> Issue Type: New Feature
> Components: C++
> Reporter: Wes McKinney
>
> It would be useful to be able to apply analytic predicates to an Arrow stream
> in a composable way. As soon as we are able to compute some simple predicates
> on in-memory Arrow data, we could define our first version of this
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)