Yicong-Huang commented on code in PR #5682:
URL: https://github.com/apache/texera/pull/5682#discussion_r3410812131


##########
amber/src/test/python/core/architecture/packaging/test_state_materialization_e2e.py:
##########
@@ -68,192 +74,195 @@
 )
 
 
-# Module-level scratch dir for the sqlite catalog + iceberg warehouse.
-# We don't initialize `StorageConfig` here: other test modules (e.g.
-# test_iceberg_document.py) also call `StorageConfig.initialize` at
-# import time, and the class rejects re-initialization with
-# RuntimeError. Whichever module gets collected first wins; we adopt
-# its namespaces below.
-_WAREHOUSE_DIR = tempfile.mkdtemp(prefix="texera-state-e2e-warehouse-")
[email protected]
+class TestStateMaterializationE2E:
+    @pytest.fixture(autouse=True)
+    def _init_storage_config(self):
+        """Initialize StorageConfig + IcebergCatalogInstance for the real
+        postgres-backed catalog in the `amber-integration` CI job.
 
+        Critical detail: the Scala integration tests that run earlier in
+        the same job connect to the iceberg catalog DB as user
+        `postgres/postgres` (the storage.conf default for
+        `STORAGE_ICEBERG_CATALOG_POSTGRES_USERNAME/PASSWORD`). pyiceberg
+        creates the catalog's `iceberg_tables` metadata table on first
+        use, owned by whoever wrote first — so it ends up owned by
+        `postgres`. We MUST connect as the same user, otherwise we hit
+        `permission denied for table iceberg_tables`.
 
[email protected](scope="module", autouse=True)
-def sqlite_iceberg_catalog():
-    """Inject a sqlite-backed SqlCatalog so the test runs without external
-    iceberg infra (postgres/minio).
+        Why the reset: `test_iceberg_document.py` also calls
+        `StorageConfig.initialize` at module import time (with a
+        different `texera/password` user that works for it because no
+        Scala writes first in the `pyamber` job where it runs). pytest
+        imports every test module during collection, even ones whose
+        tests will be deselected by `-m integration`, so that
+        initialization happens here too. We force-reset the singletons
+        and re-init with the prod-correct credentials; safe because
+        test_iceberg_document's tests are deselected from this run.
 
-    Module-scoped so all tests in this file share one warehouse, and so
-    namespace creation only happens once. We save/restore the original
-    `IcebergCatalogInstance` singleton so other test modules that expect
-    a real postgres-backed catalog (e.g. test_iceberg_document.py) are
-    not affected by our replacement.
-    """
-    # Some other test module may have initialized StorageConfig already
-    # (it has a single-init lock). If nothing has initialized it yet,
-    # do it here with arbitrary values -- we replace the catalog
-    # instance below so the postgres/rest fields are never exercised.
-    if not StorageConfig._initialized:
+        Credentials read the same `STORAGE_ICEBERG_CATALOG_POSTGRES_*`
+        env vars the production code consumes (via storage.conf), so
+        the test matches whichever identity the Scala side uses in the
+        same job. Defaults to `postgres/postgres` (the storage.conf
+        default that `amber-integration` runs with) when unset.
+        """
+        StorageConfig._initialized = False
+        IcebergCatalogInstance._instance = None
         StorageConfig.initialize(
             catalog_type="postgres",
-            postgres_uri_without_scheme="unused",
-            postgres_username="unused",
-            postgres_password="unused",
-            rest_catalog_uri="unused",
-            rest_catalog_warehouse_name="unused",
+            postgres_uri_without_scheme=os.environ.get(
+                "STORAGE_ICEBERG_CATALOG_POSTGRES_URI_WITHOUT_SCHEME",
+                "localhost:5432/texera_iceberg_catalog",
+            ),
+            postgres_username=os.environ.get(
+                "STORAGE_ICEBERG_CATALOG_POSTGRES_USERNAME", "postgres"
+            ),
+            postgres_password=os.environ.get(
+                "STORAGE_ICEBERG_CATALOG_POSTGRES_PASSWORD", "postgres"
+            ),
+            rest_catalog_uri="http://localhost:8181/catalog/";,
+            rest_catalog_warehouse_name="texera",
             table_result_namespace="operator-port-result",
             table_state_namespace="operator-port-state",
-            directory_path=_WAREHOUSE_DIR,
+            
directory_path=tempfile.mkdtemp(prefix="texera-state-e2e-warehouse-"),
             commit_batch_size=4096,
-            s3_endpoint="unused",
-            s3_region="unused",
-            s3_auth_username="unused",
-            s3_auth_password="unused",
+            s3_endpoint="http://localhost:9000";,
+            s3_region="us-east-1",
+            s3_auth_username="minioadmin",
+            s3_auth_password="minioadmin",

Review Comment:
   Addressed in 9753074b5 — replaced the hardcoded S3 creds with 
`os.environ.get(...)` reads of the same `STORAGE_S3_*` env vars storage.conf 
consumes, defaulting to the storage.conf defaults 
(`texera_minio`/`password`/`us-west-2`, `texera-large-binaries` for the 
large-binaries bucket). The test now stays aligned with whichever creds the 
surrounding job provisions.



##########
amber/src/test/python/core/architecture/packaging/test_state_materialization_e2e.py:
##########
@@ -68,192 +74,195 @@
 )
 
 
-# Module-level scratch dir for the sqlite catalog + iceberg warehouse.
-# We don't initialize `StorageConfig` here: other test modules (e.g.
-# test_iceberg_document.py) also call `StorageConfig.initialize` at
-# import time, and the class rejects re-initialization with
-# RuntimeError. Whichever module gets collected first wins; we adopt
-# its namespaces below.
-_WAREHOUSE_DIR = tempfile.mkdtemp(prefix="texera-state-e2e-warehouse-")
[email protected]
+class TestStateMaterializationE2E:
+    @pytest.fixture(autouse=True)
+    def _init_storage_config(self):

Review Comment:
   Addressed in 9753074b5 — promoted `_init_storage_config` from function-scope 
to `scope="class"`. The reset of the StorageConfig + IcebergCatalogInstance 
singletons and the tempdir warehouse allocation now happen once per class 
instead of once per test.



##########
.github/workflows/build.yml:
##########
@@ -316,25 +316,19 @@ jobs:
     # license check stay in `amber`; this job is tests-only.
     if: ${{ inputs.run_amber_integration }}
     strategy:
+      # macOS provisions postgres / minio / lakekeeper natively because
+      # GitHub-hosted macOS runners have no Docker (and `services:`
+      # containers are Linux-only). Each docker-dependent step below
+      # branches on $RUNNER_OS inside its `run:` script: Linux keeps
+      # the docker image, macOS uses brew + the upstream
+      # aarch64-apple-darwin lakekeeper tarball.
       matrix:
-        os: [ubuntu-22.04]
+        os: [ubuntu-22.04, macos-latest]
         java-version: [17]

Review Comment:
   The PR description was out of date — sorry for the confusion. Updated it to 
call out the macOS-on-amber-integration scope expansion explicitly. The intent 
isn't to dodge macOS runner cost (the old line is removed); it's that we want 
CI coverage that the integration stack actually runs on macOS dev machines, not 
just Linux. The macOS docker-free path (brew + lakekeeper aarch64-apple-darwin 
tarball + brew protobuf for Rosetta-free protoc) is now documented in the 
description as the second change in this PR.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to