[
https://issues.apache.org/jira/browse/HUDI-2267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17449744#comment-17449744
]
Danny Chen commented on HUDI-2267:
----------------------------------
Fixed via master branch: c7a5c8273b0d67cd436caec650ffb36e1a74f9e0
> Test suite infra Automate with playbook
> ---------------------------------------
>
> Key: HUDI-2267
> URL: https://issues.apache.org/jira/browse/HUDI-2267
> Project: Apache Hudi
> Issue Type: Improvement
> Components: Usability
> Reporter: sivabalan narayanan
> Priority: Major
> Labels: pull-request-available
> Fix For: 0.10.0
>
>
> Build a test infra (a suite of tests) that can be run w/ jenkins or CI
> (optionally run it), and also scriptify to run in cluster/AWS infra.
> Purpose:
> There are lot of additional features in Hudi that does not get tested when
> developing some new features. Some of the non-core features are clustering,
> archival, bulk_insert row writer path etc don't get necessary attention while
> developing a particular feature. So, we are in need of a test infra which one
> can leverage. One should be able to trigger a script called certify_patch or
> something and it should run all different tests that could one could possibly
> hit out there in the wild and produce a result if all flows succeeded or if
> anything failed.
> Operations to be verified:
> For both types of table:
> bulk insert, insert, upsert, delete, insert override, insert override table.
> delete partition.
> bulk_insert row writer with above operations.
> Test cleaning and archival gets triggered and executed as expected for both
> above flows.
> Clustering.
> Metadata table.
> For MOR:
> Compaction
> Clustering and compaction one after another.
> Clustering and compaction triggered concurrently.
> Note: For all tests, verify the sanity of data after every test. i.e. Save
> the input data and verify w/ hudi dataset.
> * Test infra should have capability to test with schema of user's choice.
> * Should be able to test all 3 levels(write client, deltastreamer, spark
> datasource). Some operations may not be feasible to test in all lavels, but
> thats understandable.
> * Once we have end to end support for spark, we need to add support for
> flink and java as well. Scope for java might be less since there is no spark
> datasource layer. But we can revisit later once we have covered spark engine.
> Publish a playbook on how to use this test infra. Both with an already
> released version or by using a locally built hudi bundle jar.
> * cluster/AWS run
> * local docker run.
> * CI integration
> Future scope:
> We can make versions of spark, hadoop, hive, etc configurable down the line.
> but for first cut, wanted to get an end to end flow working smoothly. Should
> be usable by anyone from the community or a new user who is looking to use
> Hudi.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)