houqp edited a comment on pull request #55: URL: https://github.com/apache/arrow-datafusion/pull/55#issuecomment-828880763
@jorgecarleitao looking more into the logical optimization invariants, I think we might want to relax it a little bit to account for some optimizations that may change column orders: https://github.com/apache/arrow-datafusion/blob/57eeb64659b9ca9c496a959f7716090fb32085b6/datafusion/src/optimizer/hash_build_probe_order.rs#L122-L133. So basically something like this: * If projection plan is the root node, then we can guarantee strict schema invariants for logical optimization, i.e. we preserve the exact same schema field vector. * If root node is not a projection plan, we only guarantee same set of schema fields to be preserved, but not the order From the user's point of view, it also makes sense since if I am executing a query like `SELECT * FROM t`, I am basically saying just give me all the columns in whatever order. Technically, we could still enforce strict schema invariants for all plans by manually wrapping a projection plan when the outer plan is not a projection. But I think this adds unnecessary execution overhead for minor semantic gain. Interesting in what others think about this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org