[GitHub] [arrow-datafusion] houqp edited a comment on pull request #55: Support qualified columns in queries

GitBox Wed, 28 Apr 2021 20:58:29 -0700


houqp edited a comment on pull request #55:
URL: https://github.com/apache/arrow-datafusion/pull/55#issuecomment-828880763



   @jorgecarleitao looking more into the logical optimization invariants, I 
think we might want to relax it a little bit to account for some optimizations 
that may change column orders: 
https://github.com/apache/arrow-datafusion/blob/57eeb64659b9ca9c496a959f7716090fb32085b6/datafusion/src/optimizer/hash_build_probe_order.rs#L122-L133.
   
   So basically something like this:
   
   * If projection plan is the root node, then we can guarantee strict schema 
invariants for logical optimization, i.e. we preserve the exact same schema 
field vector.
   * If root node is not a projection plan, we only guarantee same set of 
schema fields to be preserved, but not the order
   
   From the user's point of view, it also makes sense since if I am executing a 
query like `SELECT * FROM t`, I am basically saying just give me all the 
columns in whatever order.
   
   Technically, we could still enforce strict schema invariants for all plans 
by manually wrapping a projection plan when the outer plan is not a projection. 
But I think this adds unnecessary execution overhead for minor semantic gain.
   
   Interesting in what others think about this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] houqp edited a comment on pull request #55: Support qualified columns in queries

Reply via email to