[
https://issues.apache.org/jira/browse/ARROW-10732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17240026#comment-17240026
]
Neville Dipale commented on ARROW-10732:
----------------------------------------
We should be able to implement projection logic without needing to change the
Field type, or even the IPC representation. For Arrow it should be easy, but
for Parquet, it's a lot more work.
We can reference nested Arrow arrays by walking the schem tree, and getting the
arrays that we want from that, e.g. if we have `a.b.c` and `a.b.d` but want
`a.b`, we can grab `b` which will give us `b.c` and `b.d`.
With Parquet, getting `b` means returning `c` and `d1 as 2 separate arrays.
There was a PR that implemented this logic in Java/parquet-mr, but I can't seem
to find it in my browser history anymore.
I don't know if we need a design doc, I think a draft PR that's clear enough
could also work. We can implement the projection logic on the `Schema` or
`RecordBatch` in Arrow, then use that logic in DataFusion.
What do you think?
> [Rust] [DataFusion] Add SQL support for table/relation aliases and compound
> identifiers
> ---------------------------------------------------------------------------------------
>
> Key: ARROW-10732
> URL: https://issues.apache.org/jira/browse/ARROW-10732
> Project: Apache Arrow
> Issue Type: New Feature
> Components: Rust - DataFusion
> Affects Versions: 3.0.0
> Reporter: Andy Grove
> Assignee: Andy Grove
> Priority: Major
>
> We need to support referencing columns in queries using table name and/or
> alias prefixes so that we can support use cases such as joins between tables
> that have duplicate column names.
> For example:
> {code:java}
> SELECT t1.id, t1.name, t2.name FROM t1 JOIN t2 ON t1.id = t2.id {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)