[GitHub] [druid] jon-wei commented on a change in pull request #9714: More Hadoop integration tests

GitBox Tue, 28 Apr 2020 19:35:22 -0700


jon-wei commented on a change in pull request #9714:
URL: https://github.com/apache/druid/pull/9714#discussion_r417035061




##########
File path: integration-tests/README.md
##########
@@ -214,31 +214,32 @@ of the integration test run discussed above.  This is 
because druid
 test clusters might not, in general, have access to hadoop.
 This also applies to integration test that uses Hadoop HDFS as an inputSource 
or as a deep storage. 
 To run integration test that uses Hadoop, you will have to run a Hadoop 
cluster. This can be done in two ways:
+1) Run Druid Docker test clusters with Hadoop container by passing 
-Dstart.hadoop.docker=true to the mvn command. 
 1) Run your own Druid + Hadoop cluster and specified Hadoop configs in the 
configuration file (CONFIG_FILE).
-2) Run Druid Docker test clusters with Hadoop container by passing 
-Dstart.hadoop.docker=true to the mvn command. 
 
 Currently, hdfs-deep-storage and other <cloud>-deep-storage integration test 
groups can only be run with 
 Druid Docker test clusters by passing -Dstart.hadoop.docker=true to start 
Hadoop container.
 You will also have to provide -Doverride.config.path=<PATH_TO_FILE> with your 
Druid's Hadoop configs set. 
 See integration-tests/docker/environment-configs/override-examples/hdfs 
directory for example.
 Note that if the integration test you are running also uses other cloud 
extension (S3, Azure, GCS), additional
-credentials/configs may need to be set in the same file as your Druid's Hadoop 
configs set. 
+credentials/configs may need to be set in the same file as your Druid's Hadoop 
configs set.
 
 Currently, ITHadoopIndexTest can only be run with your own Druid + Hadoop 
cluster by following the below steps:
-Create a directory called batchHadoop1 in the hadoop file system
-(anywhere you want) and put batch_hadoop.data 
(integration-tests/src/test/resources/hadoop/batch_hadoop.data) 
-into that directory (as its only file).
-
-Add this keyword to the configuration file (see above):
+- Copy wikipedia_index_data1.json, wikipedia_index_data2.json, and 
wikipedia_index_data3.json
+  located in integration-tests/src/test/resources/data/batch_index/json to 
your HDFS at /batch_index/json/
+  If using the Docker-based Hadoop container, this is automatically done by 
the integration tests.
+- Copy batch_hadoop.data located in 
integration-tests/src/test/resources/data/batch_index/tsv to your HDFS
+  at /batch_index/tsv/
+  If using the Docker-based Hadoop container, this is automatically done by 
the integration tests.
 
+Run the test using mvn (using the bundled Docker-based Hadoop cluster):
 ```
-    "hadoopTestDir": "<name_of_dir_containing_batchHadoop1>"
+  mvn verify -P integration-tests -Dit.test=ITHadoopIndexTest 
-Dstart.hadoop.docker=true 
-Doverride.config.path=docker/environment-configs/override-examples/hdfs 
-Dextra.datasource.name.suffix=''
 ```
 
-Run the test using mvn:
-
+Run the test using mvn (using config file for existing Hadoop cluster):
 ```
-  mvn verify -P int-tests-config-file -Dit.test=ITHadoopIndexTest
+  mvn verify -P int-tests-config-file -Dit.test=ITHadoopIndexTest 
-Doverride.config.path=docker/environment-configs/override-examples/hdfs 
-Dextra.datasource.name.suffix=''

Review comment:
       Removed that part for the manual example

##########
File path: 
integration-tests/src/test/java/org/apache/druid/tests/TestNGGroup.java
##########
@@ -84,6 +82,15 @@
    */
   public static final String HDFS_DEEP_STORAGE = "hdfs-deep-storage";
 
+  public static final String HADOOP_S3_TO_S3 = "hadoop-s3-to-s3-deep-storage";

Review comment:
       Updated as suggested




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] jon-wei commented on a change in pull request #9714: More Hadoop integration tests

Reply via email to