[GitHub] [hudi] pengzhiwei2018 commented on pull request #2645: [HUDI-1659] Basic Implementation Of Spark Sql Support

GitBox Sat, 20 Mar 2021 07:09:00 -0700


pengzhiwei2018 commented on pull request #2645:
URL: https://github.com/apache/hudi/pull/2645#issuecomment-803355840



   > Getting started on this. Sorry for the delay.
   > 
   > How important are the changes around writeSchema vs inputSchema and such 
changes to the SQL implementation?
   
   Hi @vinothchandar ,Thanks for your review.
   It's necessary to introduce the `inputSchema` & `tableSchema` to replace the 
origin `writeSchema` for MergeInto.
   For example:
   
   Merge Into h0 using (
     select id, name, flag from s) as s0
   on s0.id = h0.id
   when matched and flag ='u' then update set id = s0.name, name = s0.name
   when not matched then insert (id, name) values(s0.id, s0.name)
   
   The input is `"select id, name, flag from s"` which schema is `(id, name, 
flag)`. But the record write to the table is `(id, name) ` after the 
update&insert translate.  The inputSchema is not equal to the writeSchema. So 
the origin `writeSchema` can not satisfy this scenario.
   I introduce  introduce the `inputSchema` & `tableSchema` to solve this 
problem. The `inputSchema` is used to parse the incoming record and the 
`tableSchema` for write & read record from the table.
   In most case except the MergeInto, The `inputSchema` is the same the 
`tableSchema`,So it should not affect the origin logical, IMO.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] pengzhiwei2018 commented on pull request #2645: [HUDI-1659] Basic Implementation Of Spark Sql Support

Reply via email to