[GitHub] [hudi] envomp opened a new pull request, #8738: HoodieAvroUtils supports enum => conversion rewrite

via GitHub Wed, 17 May 2023 05:48:41 -0700


envomp opened a new pull request, #8738:
URL: https://github.com/apache/hudi/pull/8738


   Our current flows are as follows:
   
   fetch schema:
   - Fetch desired table schema in Avro format from schema registry
   - Get a respective dataset schema given desired table Avro schema
   - Convert respective dataset schema back to Avro schema to get an unified 
schema
   
   transform input:
   - Consume Kafka via Spark Streaming and receive `RDD<GenericRecord>`
   - Rewrite the RDD to unified schema to resolve version differences and to 
end up with desired schema where some fields get dropped and some datatypes get 
changed. Enum => String for example
        - We use a copy of the org.apache.hudi.avro.HoodieAvroUtils class to do 
the rewrite and require it to support aforementioned rewrite procedure
        - Hopefully we can make the changes in public repo, so we don't need to 
maintain a custom rewrite class
   - Convert `RDD<GenricRecord>` to `Dataset<Row>`
   
   There are situations where we read input from S3 directly resulting 
Dataset<Row> which needs to be backfilled to table, so maintaining control over 
the schema on writer side is benefitial for us.
   
   ### Change Logs
   
   Rewrite to support enum => string conversion
   
   ### Impact
   
   no impact
   
   ### Risk level (write none, low medium or high below)
   
   none
   
   ### Documentation Update
   
   no documentation needed
   
   ### Contributor's checklist
   
   - [x] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [x] Change Logs and Impact were stated clearly
   - [x] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] envomp opened a new pull request, #8738: HoodieAvroUtils supports enum => conversion rewrite

Reply via email to