pengzhiwei2018 commented on pull request #2645:
URL: https://github.com/apache/hudi/pull/2645#issuecomment-803355840
> Getting started on this. Sorry for the delay.
>
> How important are the changes around writeSchema vs inputSchema and such
changes to the SQL implementation?
Hi @vinothchandar ,Thanks for your review.
It's necessary to introduce the `inputSchema` & `tableSchema` to replace the
origin `writeSchema` for MergeInto.
For example:
Merge Into h0 using (
select id, name, flag from s) as s0
on s0.id = h0.id
when matched and flag ='u' then update set id = s0.name, name = s0.name
when not matched then insert (id, name) values(s0.id, s0.name)
The input is `"select id, name, flag from s"` which schema is `(id, name,
flag)`. But the record write to the table is `(id, name) ` after the
update&insert translate. The inputSchema is not equal to the writeSchema. So
the origin `writeSchema` can not satisfy this scenario.
I introduce introduce the `inputSchema` & `tableSchema` to solve this
problem. The `inputSchema` is used to parse the incoming record and the
`tableSchema` for write & read record from the table.
In most case except the MergeInto, The `inputSchema` is the same the
`tableSchema`,So it should not affect the origin logical, IMO.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]