pengzhiwei2018 edited a comment on pull request #2334:
URL: https://github.com/apache/hudi/pull/2334#issuecomment-799071055


   > @pengzhiwei2018 first of all, thanks for these great contributions.
   > 
   > Wrt inputSchema vs writeSchema, I actually feel writeSchema already stands 
for inputSchema, input is what is being written, right? We can probably just 
leave it as is. and introduce new `tableSchema` variables as you have in the 
`HoodieWriteHandle` class.?
   > 
   > Like someone else pointed out as well, so far, we are using read and write 
schemas consistently. Love to not introduce a new input schema, unless its 
absolutely necessary .
   
   Hi @vinothchandar ,thanks for your reply on this issue.
   Yes, in most case ,the `writeSchema` is the same with the `inputSchema`  
which can stands for the `inputSchema` . But in the case in this PR (test case 
in 
[TestCOWDataSource](https://github.com/apache/hudi/pull/2334/files#diff-9429f5bc432f70ea4801e306dd817416b76e6ab68d41a278e222c989ce5c9824))
 we write the table twice:
   First, we write a "id: long" to the table. The input schema is "a:long", the 
table schema is "a:long". 
   Second, we write a "id:int" to the table. The input schema is "a:int", but 
the table schema is "a:long" as the previous write. The write schema should be 
the same with the table schema, or else an Exception would throw out which is 
the problem we want to solve in this PR.
   So in this case, we need to distinguish the difference between the 
`inputSchema` and `writeSchema`. The `inputSchema` is the incoming records's 
schema, but the `writeSchema` is always the `tableSchema`. 
   
   - The `inputSchema` is used to parser the record from the incoming data.
   - The `tableSchema` is used to write and read record from the table. When we 
want to write or read record to the table, we use the `tableSchema`.
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to