houqp commented on pull request #55:
URL: https://github.com/apache/arrow-datafusion/pull/55#issuecomment-827766515


   Thanks @jorgecarleitao for the invariant design doc, I wasn't aware of it 
before. I will definitely go through it and write something up for this PR this 
weekend. I think we should spend some effort into adding that doc into this 
repo as well. This is the kind of reference I have wanted to have while working 
on the patch set.
   
   My first attempt of this does preserve the invariant of `SELECT t.a FROM t 
must result in the schema field name t.a`, but I later found out this would 
cause issue with subsequent read from writes. Basically the reader will end up 
with something like `t.t.a`. So I changed the behavior to be consistent with 
Spark and MySQL.
   
   For breaking changes in schema fields involving operators, I will do more 
research on my end. The current behavior is the one with the least amount of 
work for me. But I am leaning towards better compatibility with the current 
data eng ecosystem like spark, mysql and postgres.
   
   I fully agree with the optimization invariants. I have been thinking about 
whether there is better way to enforce them more formally other than unit tests.
   
   @jorgecarleitao to your last question, all the invariants, other than the 
first one I mentioned in this comment, are pretty straight forward to maintain 
by tracking the user provided input in the column struct. I didn't do it 
because I wasn't aware of these invariants.
   
   @alamb I am with you for the size of this PR as well :( That's why I decided 
to send it out early before finishing it because I noticed it just kept growing 
in size especially after every time I merge with master :P. I will try to see 
if I can restrict the change to just logical plans, but as @jorgecarleitao 
mentioned it might not be easy. I will do it if the overhead turned out to be 
small.
   
   Anyway, putting code change aside, I think the most urgent thing to do is 
for me to write up a design doc to fully spec out the semantics so we are all 
on the same page on what needs to be implemented.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to