jordepic commented on code in PR #2441:
URL: https://github.com/apache/iceberg-rust/pull/2441#discussion_r3282278290
##########
dev/docker-compose.yaml:
##########
@@ -147,6 +147,50 @@ services:
timeout: 5s
retries: 5
+ #
=============================================================================
+ # HDFS - single-node NameNode + DataNode for HDFS tests
+ #
=============================================================================
+ # Mirrors apache/opendal's fixtures/hdfs/docker-compose-hdfs-cluster.yml:
+ # same bde2020 images, host networking on both services. Host networking
+ # is required because hdfs-native 0.13.5 connects to the DataNode by IP
+ # from `DatanodeIdProto.ip_addr` (not by hostname). On a docker bridge
+ # the DN would register with an unroutable bridge IP; host networking
+ # lets it bind directly on the host network namespace so the registered
+ # address is host-reachable.
+ #
+ # This works on Linux CI runners. On macOS / Windows Docker Desktop
+ # host networking has known issues (e.g. unresolvable VM hostname), so
+ # the HDFS integration tests are `#[ignore]`d; CI explicitly opts them
+ # in via `cargo nextest --run-ignored=only` (see .github/workflows/ci.yml).
+ hdfs-namenode:
+ image: bde2020/hadoop-namenode:2.0.0-hadoop3.2.1-java8
+ network_mode: "host"
+ environment:
+ CLUSTER_NAME: iceberg-rust-test
+ CORE_CONF_fs_defaultFS: hdfs://localhost:8020
+ CORE_CONF_hadoop_http_staticuser_user: root
+ HDFS_CONF_dfs_permissions_enabled: false
+ HDFS_CONF_dfs_replication: 1
+ healthcheck:
+ test: ["CMD-SHELL", "hdfs dfsadmin -safemode get | grep -q OFF"]
+ interval: 5s
+ timeout: 5s
+ retries: 30
+ start_period: 30s
+
+ hdfs-datanode:
+ image: bde2020/hadoop-datanode:2.0.0-hadoop3.2.1-java8
Review Comment:
Fair point. We picked these because apache/opendal uses the exact same
images and tags for their own services-hdfs-native integration tests
(fixtures/hdfs/docker-compose-hdfs-cluster.yml in
apache/opendal). The thinking was: mirror their HDFS fixture exactly so
iceberg-rust's HDFS test infra moves in lockstep with the OpenDAL crate we
depend on.
apache/hadoop:3.5.0 would be more current but isn't a drop-in. bde2020 ships
an envtoconf.py helper that translates HDFS_CONF_* env vars into hdfs-site.xml
properties at startup — apache/hadoop doesn't
have an equivalent, so we'd need to vendor static core-site.xml /
hdfs-site.xml under dev/hdfs/, split the entrypoint into hdfs namenode / hdfs
datanode commands, and add an ENSURE_NAMENODE_DIR
bootstrap step. Not a problem if you'd prefer that!
I was mainly looking to stick in line with OpenDAL, but I can change this
one, actually. The test is the same idea anyways, and we're just changing how
the image works.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]