nsivabalan opened a new pull request #2767: URL: https://github.com/apache/hudi/pull/2767
## What is the purpose of the pull request Quick start in hudi has a statically defined schema. But it would be nice for users to try our their own schema with QuickStart. This patch adds such capability. User just need to provide toString representation of avroSchema and record key and the rest is taken care of(data generation, etc) by the datagen tool. additional steps in Quick start to try out custom schema ``` val userSchema = // toString() of an avro schema // ensure schema has the field for record key. partition path is added internally by the datagen tool and so not required to be part of the userSchema. As of now, only SimpleKeyGenerator is supported for both record key and partition path and partitionpath field is hardcoded. dataGen.instantiateSchema(userSchema, "rowKey") // replace "rowKey" with the field for record key ``` Sample run book: https://gist.github.com/nsivabalan/f47805e00996735a054d324a67d4296c This patch adds dependency to io.confluent.avro:avro-random-generator to assist in generating random data for a given avro schema. Dependency is added to hudi-spark-bundle which needs discussion whether we need to keep it in this bundle or move it to a test module. But w/ test module, users will have to build locally before they can try it out. ## Brief change log *(for example:)* - Added support to test QuickStart w/ a custom avro schema. ## Verify this pull request - Added test to TestQuickstartUtils. - Also, tested Quick start for inserts, updates and deletes. https://gist.github.com/nsivabalan/f47805e00996735a054d324a67d4296c ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
