Re: [PR] [FLINK-39499][tests] Replace MinIO with SeaweedFS [flink]

via GitHub Thu, 18 Jun 2026 12:23:44 -0700


spuru9 commented on code in PR #28369:
URL: https://github.com/apache/flink/pull/28369#discussion_r3438372862



##########
flink-end-to-end-tests/test-scripts/test_batch_wordcount.sh:
##########
@@ -84,7 +85,7 @@ start_cluster
 # The test may run against different source types.
 # But the sources should provide the same test data, so the checksum stays the 
same for all tests.
 ${FLINK_DIR}/bin/flink run -p 1 ${FLINK_DIR}/examples/streaming/WordCount.jar 
${ARGS}
-# Fetches result from AWS s3 to the OUTPUT_PATH, no-op for other filesystems 
and minio-based tests
+# Fetches result from AWS s3 to the OUTPUT_PATH, no-op for other filesystems 
and seaweedfs-based tests

Review Comment:
   nit: This comment is now wrong — hadoop_seaweedfs does fetch from S3 (it 
sets fetch_complete_result). Only presto_seaweedfs_read is the no-op case.
   
   ```suggestion
   # Fetches result from AWS s3 to the OUTPUT_PATH, no-op for other filesystems 
and the presto_seaweedfs_read test
   ```



##########
flink-filesystems/flink-s3-fs-hadoop/src/test/java/org/apache/flink/fs/s3hadoop/S5CmdOnHadoopS3FileSystemITCase.java:
##########
@@ -18,8 +18,8 @@
 
 package org.apache.flink.fs.s3hadoop;
 
-import org.apache.flink.fs.s3.common.HAJobRunOnMinioS3StoreITCase;
-import org.apache.flink.fs.s3.common.S5CmdOnMinioITCase;
+import org.apache.flink.fs.s3.common.HAJobRunOnSeaweedFsS3StoreITCase;
+import org.apache.flink.fs.s3.common.S5CmdOnSeaweedFsITCase;
 
-/** Runs the {@link HAJobRunOnMinioS3StoreITCase} on the Hadoop S3 file 
system. */
-public class S5CmdOnHadoopS3FileSystemITCase extends S5CmdOnMinioITCase {}
+/** Runs the {@link HAJobRunOnSeaweedFsS3StoreITCase} on the Hadoop S3 file 
system. */

Review Comment:
   nit: this was a existing issue, the `@link` points at 
HAJobRunOnSeaweedFsS3StoreITCase, but the test actually runs 
S5CmdOnSeaweedFsITCase. The `HAJobRun…` import is only kept alive by that link, 
so we can drop it too while fixing the doc.
   
   ```suggestion
   /** Runs the {@link S5CmdOnSeaweedFsITCase} on the Hadoop S3 file system. */
   ```
   
   Same thing in S5CmdOnPrestoS3FileSystemITCase (just swap "Hadoop" → 
"Presto").



##########
flink-filesystems/flink-s3-fs-base/src/test/java/org/apache/flink/fs/s3/common/S5CmdOnSeaweedFsITCase.java:
##########
@@ -103,17 +104,18 @@ public abstract class S5CmdOnMinioITCase {
 
     private static Configuration createConfiguration() {
         final Configuration config = new Configuration();
-        getMinioContainer().setS3ConfigOptions(config);
+        getSeaweedFsContainer().setS3ConfigOptions(config);
         File credentialsFile = new File(temporaryDirectory, "credentials");
 
         try {
             // It looks like on the CI machines s5cmd by default is using some 
other default
             // authentication mechanism, that takes precedence over passing 
secret and access keys
             // via environment variables. For example maybe there exists a 
credentials file in the
-            // default location with secrets from the S3, not MinIO. To 
circumvent it, lets use our
-            // own credentials file with secrets for MinIO.
+            // default location with secrets from the S3, not SeaweedFs. To 
circumvent it, lets use
+            // our

Review Comment:
   nit: dangling comment line.



##########
flink-end-to-end-tests/test-scripts/common_s3_seaweedfs.sh:
##########
@@ -107,12 +108,17 @@ function s3_setup_with_provider {
   add_optional_plugin "s3-fs-$1"
   # reads (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
   set_config_key "$2" 
"com.amazonaws.auth.EnvironmentVariableCredentialsProvider"
-  # change endpoint to minio's location
+  # change endpoint to seaweedfs's location
   set_config_key "s3.endpoint" "$S3_ENDPOINT"
-  # If the test is using virtual host style (default), then it tries to reach 
minio on <bucket>.localhost:<port>,
+  # If the test is using virtual host style (default), then it tries to reach 
seaweedfs on <bucket>.localhost:<port>,
   # which docker does not properly forward.
   set_config_key "s3.path.style.access" "true"
   set_config_key "s3.path-style-access" "true"
 }
 
 source "$(dirname "$0")"/common_s3_operations.sh
+
+# SeaweedFS does not expose on-disk directories as S3 buckets,
+# so the bucket must be created explicitly
+aws_cli s3 mb "s3://$IT_CASE_S3_BUCKET"
+aws_cli s3 cp "/hostdir/test-data/words" "$S3_TEST_DATA_WORDS_URI"

Review Comment:
   nit: we fire off the bucket-create the moment the container flips to 
Running, but that just means the process started, not that the S3 gateway is 
actually listening yet. Probably fine since the awscli container takes a sec to 
come up anyway, but a tiny retry here would save us a confusing nightly failure 
if it ever loses the race:
   
   ```suggestion
   retry_times 5 2 "aws_cli s3 mb s3://$IT_CASE_S3_BUCKET"
   aws_cli s3 cp "/hostdir/test-data/words" "$S3_TEST_DATA_WORDS_URI"
   ```
   
   Totally fine to skip if it's been solid for you.



##########
flink-filesystems/flink-s3-fs-base/src/test/java/org/apache/flink/fs/s3/common/S5CmdOnSeaweedFsITCase.java:
##########
@@ -73,20 +73,21 @@
 import static org.assertj.core.api.Assertions.assertThat;
 
 /**
- * {@code HAJobRunOnMinioS3StoreITCase} covers a job run where the HA data is 
stored in Minio. The
- * implementation verifies whether the {@code JobResult} was written into the 
FileSystem-backed
- * {@code JobResultStore}.
+ * {@code HAJobRunOnSeaweedFsS3StoreITCase} covers a job run where the HA data 
is stored in

Review Comment:
   nit: Also an existing thing, Copy-paste from the HA test — this Javadoc 
describes HAJobRun, not s5cmd. Since you're already rewriting the line, mind 
making it actually describe this class?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [FLINK-39499][tests] Replace MinIO with SeaweedFS [flink]

Reply via email to