zhang-yue1 commented on issue #14297:
URL: https://github.com/apache/hudi/issues/14297#issuecomment-3569418438

   > [@zhang-yue1](https://github.com/zhang-yue1) I checked the code, and found 
that the generated projection schema is only used for internal filegroup reader 
to read data and transform data into output schema. And there is no updating to 
any job conf, curious about how the problem is fixed by adding `.distinct()` to 
the stream, did you make any other changes?
   
   No other changes were made.
   
   The issue is that Hive/Tez sometimes includes duplicate columns in the 
config (e.g. when using filters). The AvroRuntimeException is thrown strictly 
during Schema construction, because Avro forbids duplicate fields.
   
   Without .distinct(), the Schema constructor fails immediately and crashes 
the job. This change simply ensures the input list is valid for creating the 
Avro Schema object.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to