houqp edited a comment on pull request #55:
URL: https://github.com/apache/arrow-datafusion/pull/55#issuecomment-828880763


   @jorgecarleitao looking more into the logical optimization invariants, I 
think we might want to relax it a little bit to account for some optimizations 
that may change column orders: 
https://github.com/apache/arrow-datafusion/blob/57eeb64659b9ca9c496a959f7716090fb32085b6/datafusion/src/optimizer/hash_build_probe_order.rs#L122-L133.
   
   So basically something like this:
   
   * If projection is the outer plan, then we can guarantee strict schema 
invariants for logical optimization, i.e. we preserve the exact same schema 
field vector.
   * If outer plan is not projection, we only guarantee same set of schema 
fields to be preserved, but not the order
   
   From the user's point of view, it also makes sense since if I am executing a 
query like `SELECT * FROM t`, I am basically saying just give me all the 
columns in whatever order.
   
   Technically, we could still enforce strict schema invariants for all plans 
by manually wrapping a projection plan when the outer plan is not a projection. 
But I think this adds unnecessary execution overhead for minor semantic gain.
   
   Interesting in what others think about this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to