[ 
https://issues.apache.org/jira/browse/ARROW-15238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17469488#comment-17469488
 ] 

Weston Pace commented on ARROW-15238:
-------------------------------------

One artifact of this is that the arrow::dataset::internal::Intialize function 
should go away and the adding of scan/write tasks to the exec node registry 
should happen statically when the default exec node registry is created (this 
default exec node registry creation will need to happen in the engine module).

> [C++] Create "engine" module for the query engine
> -------------------------------------------------
>
>                 Key: ARROW-15238
>                 URL: https://issues.apache.org/jira/browse/ARROW-15238
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Weston Pace
>            Assignee: Weston Pace
>            Priority: Major
>              Labels: query-engine
>             Fix For: 8.0.0
>
>
> Circular dependencies are popping up in the query engine as the compute 
> module is very low level.  For example, it would be nice if the default 
> registry included the scan node and dataset write node.  We will want to be 
> adding spillover support at some point and that will rely on parquet/dataset 
> operations.
> We should create a dedicated engine module which includes the query plans, 
> the nodes, etc.  This module would not contain the kernels or other low level 
> compute primitives.  This way we could have something like...
> engine -> datasets (for scanning) -> parquet -> compute (for calculating 
> statistics)
> The base ExecPlan itself could either go in compute or engine depending on 
> which has the least amount of friction.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to