lw309637554 commented on pull request #2263: URL: https://github.com/apache/hudi/pull/2263#issuecomment-748632649
> > This is great, thanks @satishkotha ! > > I have completed a first pass. Don't have major concerns. May be we can work through these initial comments, as I complete the remainder. > > On follow ups > > > > * IIUC inline clustering should work as-is from datasource/deltastreamer paths with this change, by passing necessary configs. We should create two JIRA one each for support for async clustering via datasource and deltastreamer? > > * Can you share how much testing on a production environment has been done for this. > > @vinothchandar > > * Looks like @lw309637554 created https://issues.apache.org/jira/browse/HUDI-1399 for async clustering and there's some good discussion there. > * I did some basic testing in staging environment mostly with inline clustering. I have another PR for test suite changes to validate async clustering via calling writeClient APIs. I hope to get more production scale tests over next week. @satishkotha i had added inline clustering unit tests for spark datasource and deltastreamer in my local branch . When this pr merged , i can open pull request. Then base on the unit tests , will land the independent clustering spark job , then async clustering via datasource and via deltastreamer. cc @vinothchandar ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
