[ 
https://issues.apache.org/jira/browse/ARROW-10732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17240026#comment-17240026
 ] 

Neville Dipale commented on ARROW-10732:
----------------------------------------

We should be able to implement projection logic without needing to change the 
Field type, or even the IPC representation. For Arrow it should be easy, but 
for Parquet, it's a lot more work.

We can reference nested Arrow arrays by walking the schem tree, and getting the 
arrays that we want from that, e.g. if we have `a.b.c` and `a.b.d` but want 
`a.b`, we can grab `b` which will give us `b.c` and `b.d`.

With Parquet, getting `b` means returning `c` and `d1 as 2 separate arrays. 
There was a PR that implemented this logic in Java/parquet-mr, but I can't seem 
to find it in my browser history anymore.

I don't know if we need a design doc, I think a draft PR that's clear enough 
could also work. We can implement the projection logic on the `Schema` or 
`RecordBatch` in Arrow, then use that logic in DataFusion.

What do you think?

> [Rust] [DataFusion] Add SQL support for table/relation aliases and compound 
> identifiers
> ---------------------------------------------------------------------------------------
>
>                 Key: ARROW-10732
>                 URL: https://issues.apache.org/jira/browse/ARROW-10732
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: Rust - DataFusion
>    Affects Versions: 3.0.0
>            Reporter: Andy Grove
>            Assignee: Andy Grove
>            Priority: Major
>
> We need to support referencing columns in queries using table name and/or 
> alias prefixes so that we can support use cases such as joins between tables 
> that have duplicate column names.
> For example:
> {code:java}
> SELECT t1.id, t1.name, t2.name FROM t1 JOIN t2 ON t1.id = t2.id {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to