nsivabalan opened a new pull request #2767:
URL: https://github.com/apache/hudi/pull/2767


   ## What is the purpose of the pull request
   
   Quick start in hudi has a statically defined schema. But it would be nice 
for users to try our their own schema with QuickStart. This patch adds such 
capability. User just need to provide toString representation of avroSchema and 
record key and the rest is taken care of(data generation, etc) by the datagen 
tool.
   
   additional steps in Quick start to try out custom schema
   ```
   val userSchema = // toString() of an avro schema
   // ensure schema has the field for record key. partition path is added 
internally by the datagen tool and so not required to be part of the 
userSchema. As of now, only SimpleKeyGenerator is supported for both record key 
and partition path and partitionpath field is hardcoded. 
   
   dataGen.instantiateSchema(userSchema, "rowKey") // replace "rowKey" with the 
field for record key
   ```
   Sample run book: 
https://gist.github.com/nsivabalan/f47805e00996735a054d324a67d4296c
   
   This patch adds dependency to io.confluent.avro:avro-random-generator to 
assist in generating random data for a given avro schema. Dependency is added 
to hudi-spark-bundle which needs discussion whether we need to keep it in this 
bundle or move it to a test module. But w/ test module, users will have to 
build locally before they can try it out. 
   
   ## Brief change log
   
   *(for example:)*
     - Added support to test QuickStart w/ a custom avro schema. 
   
   ## Verify this pull request
   
   - Added test to TestQuickstartUtils.
   - Also, tested Quick start for inserts, updates and deletes. 
https://gist.github.com/nsivabalan/f47805e00996735a054d324a67d4296c
   
   ## Committer checklist
   
    - [ ] Has a corresponding JIRA in PR title & commit
    
    - [ ] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to