houqp edited a comment on pull request #55: URL: https://github.com/apache/arrow-datafusion/pull/55#issuecomment-827766515
Thanks @jorgecarleitao for the invariant design doc, I wasn't aware of it before. I will definitely go through it and write something up for this PR this weekend. I think we should spend some effort into adding that doc into this repo as well. This is the kind of reference I have wanted to have while working on the patch set. My first attempt of this does preserve the invariant of `SELECT t.a FROM t must result in the schema field name t.a`, but I later found out this would cause issue with subsequent read from writes. Basically the reader will end up with something like `t.t.a`. So I changed the behavior to be consistent with Spark and MySQL. For breaking changes in schema fields involving operators, I will do more research on my end. The current behavior is the one with the least amount of work for me. But I am leaning towards better compatibility with the current data eng ecosystem like spark, mysql and postgres. I fully agree with the optimization invariants. I have been thinking about whether there is a better way to enforce them more formally other than unit tests. @jorgecarleitao to your last question, all the invariants, other than the first one I mentioned in this comment, are pretty straight forward to maintain by tracking the user provided input in the column struct. I didn't do it because I wasn't aware of these invariants. @alamb I am with you for the size of this PR as well :( That's why I decided to send it out early before finishing it because I noticed it just kept growing in size especially after every time I merge with master :P. I will try to see if I can restrict the change to just logical plans, but as @jorgecarleitao mentioned it might not be easy. I will do it if the overhead turned out to be small. Anyway, putting code change aside, I think the most urgent thing to do is for me to write up a design doc to fully spec out the semantics so we are all on the same page on what needs to be implemented. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
