yihua commented on a change in pull request #5065:
URL: https://github.com/apache/hudi/pull/5065#discussion_r834883287
##########
File path: docker/compose/docker-compose_hadoop284_hive233_spark244.yml
##########
@@ -288,6 +288,8 @@ services:
- ./hadoop.env
depends_on:
- sparkmaster
+ ports:
+ - '4042:4040'
Review comment:
nit: should be `4042:4042`?
##########
File path:
hudi-integ-test/src/main/java/org/apache/hudi/integ/testsuite/HoodieMultiWriterTestSuiteJob.java
##########
@@ -0,0 +1,181 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.integ.testsuite;
+
+import org.apache.hudi.utilities.UtilHelpers;
+
+import com.beust.jcommander.JCommander;
+import com.beust.jcommander.Parameter;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.sql.SparkSession;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicBoolean;
+import java.util.concurrent.atomic.AtomicInteger;
+
+public class HoodieMultiWriterTestSuiteJob {
Review comment:
nit: add docs for this class on how it can be used. You can copy the
`spark-submit` command here.
##########
File path:
hudi-integ-test/src/main/java/org/apache/hudi/integ/testsuite/HoodieTestSuiteJob.java
##########
@@ -211,12 +221,14 @@ public void runTestSuite() {
}
}
- private void stopQuietly() {
- try {
- sparkSession.stop();
- jsc.stop();
- } catch (Exception e) {
- log.error("Unable to stop spark session", e);
+ protected void stopQuietly() {
+ if (stopJsc) {
Review comment:
nit: a better way to programmatically shut down JSC is to allow this
method to be called externally. Inside this class, if auto shutdown is
desired, `stopQuietly()` can be called internally. Otherwise, it should not be
called internally, and it should be shut down by the multi-writer job.
##########
File path:
hudi-integ-test/src/main/java/org/apache/hudi/integ/testsuite/HoodieMultiWriterTestSuiteJob.java
##########
@@ -0,0 +1,181 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.integ.testsuite;
+
+import org.apache.hudi.utilities.UtilHelpers;
+
+import com.beust.jcommander.JCommander;
+import com.beust.jcommander.Parameter;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.sql.SparkSession;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicBoolean;
+import java.util.concurrent.atomic.AtomicInteger;
+
+public class HoodieMultiWriterTestSuiteJob {
+
+ private static final Logger LOG =
LogManager.getLogger(HoodieMultiWriterTestSuiteJob.class);
+
+ public static void main(String[] args) throws Exception {
+ final HoodieMultiWriterTestSuiteConfig cfg = new
HoodieMultiWriterTestSuiteConfig();
+ JCommander cmd = new JCommander(cfg, args);
+ if (cfg.help || args.length == 0) {
+ cmd.usage();
+ System.exit(1);
+ }
+
+ JavaSparkContext jssc =
UtilHelpers.buildSparkContext("workload-generator-" + cfg.outputTypeName
Review comment:
nit: do you want to rename this to sth like "multi-writer-test-run-xyz"?
##########
File path:
hudi-integ-test/src/main/java/org/apache/hudi/integ/testsuite/generator/DeltaGenerator.java
##########
@@ -150,21 +152,29 @@ public int getBatchId() {
JavaRDD<GenericRecord> adjustedRDD = null;
if (config.getNumUpsertPartitions() != 0) {
if (config.getNumUpsertPartitions() < 0) {
+ log.info("Generating updates reading from path 111 " +
((DFSDeltaConfig) deltaOutputConfig).getDeltaBasePath());
Review comment:
nit: remove magic number
##########
File path: pom.xml
##########
@@ -51,6 +51,9 @@
<module>packaging/hudi-utilities-bundle</module>
<module>packaging/hudi-timeline-server-bundle</module>
<module>packaging/hudi-trino-bundle</module>
+ <module>docker/hoodie/hadoop</module>
+ <module>hudi-integ-test</module>
+ <module>packaging/hudi-integ-test-bundle</module>
Review comment:
These should be excluded? Only with `-Pintegration-tests`, integ-test
modules should be built.
##########
File path:
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/metrics/HoodieMetricsConfig.java
##########
@@ -180,8 +179,8 @@ public HoodieMetricsConfig build() {
HoodieMetricsJmxConfig.newBuilder().fromProperties(hoodieMetricsConfig.getProps()).build());
hoodieMetricsConfig.setDefaultOnCondition(reporterType ==
MetricsReporterType.GRAPHITE,
HoodieMetricsGraphiteConfig.newBuilder().fromProperties(hoodieMetricsConfig.getProps()).build());
- hoodieMetricsConfig.setDefaultOnCondition(reporterType ==
MetricsReporterType.CLOUDWATCH,
-
HoodieMetricsCloudWatchConfig.newBuilder().fromProperties(hoodieMetricsConfig.getProps()).build());
+ //hoodieMetricsConfig.setDefaultOnCondition(reporterType ==
MetricsReporterType.CLOUDWATCH,
+ //
HoodieMetricsCloudWatchConfig.newBuilder().fromProperties(hoodieMetricsConfig.getProps()).build());
Review comment:
Are these intended changes?
##########
File path: docker/demo/config/test-suite/multi-writer-2-sds.yaml
##########
@@ -0,0 +1,52 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+dag_name: cow-spark-simple.yaml
+dag_rounds: 3
+dag_intermittent_delay_mins: 0
+dag_content:
+ first_insert:
Review comment:
Do you plan to add the control over which partitions the source data
falls in? So that two writers can write to different partitions?
##########
File path:
hudi-integ-test/src/main/java/org/apache/hudi/integ/testsuite/HoodieMultiWriterTestSuiteJob.java
##########
@@ -0,0 +1,181 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.integ.testsuite;
+
+import org.apache.hudi.utilities.UtilHelpers;
+
+import com.beust.jcommander.JCommander;
+import com.beust.jcommander.Parameter;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.sql.SparkSession;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicBoolean;
+import java.util.concurrent.atomic.AtomicInteger;
+
+public class HoodieMultiWriterTestSuiteJob {
+
+ private static final Logger LOG =
LogManager.getLogger(HoodieMultiWriterTestSuiteJob.class);
+
+ public static void main(String[] args) throws Exception {
+ final HoodieMultiWriterTestSuiteConfig cfg = new
HoodieMultiWriterTestSuiteConfig();
+ JCommander cmd = new JCommander(cfg, args);
+ if (cfg.help || args.length == 0) {
+ cmd.usage();
+ System.exit(1);
+ }
+
+ JavaSparkContext jssc =
UtilHelpers.buildSparkContext("workload-generator-" + cfg.outputTypeName
+ + "-" + cfg.inputFormatName, cfg.sparkMaster);
+
+ String[] inputPaths = cfg.inputBasePaths.split(",");
+ String[] yamls = cfg.workloadYamlPaths.split(",");
+ String[] propsFiles = cfg.propsFilePaths.split(",");
Review comment:
nit: verify that the length of these configs are the same?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]