gemini-code-assist[bot] commented on code in PR #39142:
URL: https://github.com/apache/beam/pull/39142#discussion_r3488688922


##########
sdks/python/apache_beam/yaml/integration_tests.py:
##########
@@ -556,6 +561,73 @@ def temp_oracle_database():
     yield f"jdbc:oracle:thin:system/oracle@localhost:{port}/XEPDB1"
 
 
[email protected]
+def temp_iceberg_table_with_pk(table_data):
+
+  # Create a temp dir that will be shared between host and container.
+  # We use the exact same path on both to avoid path mapping issues.
+  # We create it in the current working directory (workspace) because
+  # Docker in GitHub Actions often cannot mount directories from /tmp.
+  temp_dir = tempfile.mkdtemp(dir=os.getcwd())
+  os.chmod(temp_dir, 0o777)
+
+  # Start the Iceberg REST catalog container
+  container = DockerContainer("tabulario/iceberg-rest:0.6.0")
+  container.with_exposed_ports(8181)
+  container.with_volume_mapping(temp_dir, temp_dir, mode='rw')
+  container.with_env("HADOOP_USER_NAME", "iceberg")
+  container.with_env("CATALOG_WAREHOUSE", temp_dir)
+  container.with_env(
+      "CATALOG_IO__IMPL", "org.apache.iceberg.hadoop.HadoopFileIO")
+
+  try:
+    container.start()
+
+    ip = container.get_container_host_ip()
+    port = container.get_exposed_port(8181)
+    api_url = f"http://{ip}:{port}";
+
+    # Poll the REST API until it is ready
+    for _ in range(30):
+      try:
+        response = requests.get(f"{api_url}/v1/config")
+        if response.status_code == 200:
+          break
+      except requests.exceptions.ConnectionError:
+        pass
+      time.sleep(1)

Review Comment:
   ![medium](https://www.gstatic.com/codereviewagent/medium-priority.svg)
   
   The `requests.get` call is missing a `timeout` parameter. In CI/CD 
environments, network requests without a timeout can hang indefinitely, causing 
the build to block. Adding a timeout and catching `requests.exceptions.Timeout` 
(or using the broader `requests.exceptions.RequestException`) prevents this.
   
   ```suggestion
       for _ in range(30):
         try:
           response = requests.get(f"{api_url}/v1/config", timeout=5)
           if response.status_code == 200:
             break
         except (requests.exceptions.ConnectionError, 
requests.exceptions.Timeout):
           pass
         time.sleep(1)
   ```



##########
sdks/python/apache_beam/yaml/integration_tests.py:
##########
@@ -556,6 +561,73 @@ def temp_oracle_database():
     yield f"jdbc:oracle:thin:system/oracle@localhost:{port}/XEPDB1"
 
 
[email protected]
+def temp_iceberg_table_with_pk(table_data):
+
+  # Create a temp dir that will be shared between host and container.
+  # We use the exact same path on both to avoid path mapping issues.
+  # We create it in the current working directory (workspace) because
+  # Docker in GitHub Actions often cannot mount directories from /tmp.
+  temp_dir = tempfile.mkdtemp(dir=os.getcwd())
+  os.chmod(temp_dir, 0o777)
+
+  # Start the Iceberg REST catalog container
+  container = DockerContainer("tabulario/iceberg-rest:0.6.0")
+  container.with_exposed_ports(8181)
+  container.with_volume_mapping(temp_dir, temp_dir, mode='rw')
+  container.with_env("HADOOP_USER_NAME", "iceberg")
+  container.with_env("CATALOG_WAREHOUSE", temp_dir)
+  container.with_env(
+      "CATALOG_IO__IMPL", "org.apache.iceberg.hadoop.HadoopFileIO")
+
+  try:
+    container.start()
+
+    ip = container.get_container_host_ip()
+    port = container.get_exposed_port(8181)
+    api_url = f"http://{ip}:{port}";
+
+    # Poll the REST API until it is ready
+    for _ in range(30):
+      try:
+        response = requests.get(f"{api_url}/v1/config")
+        if response.status_code == 200:
+          break
+      except requests.exceptions.ConnectionError:
+        pass
+      time.sleep(1)
+    else:
+      raise RuntimeError("Iceberg REST catalog failed to start in time.")
+
+    # Create namespace 'db'
+    requests.post(
+        f"{api_url}/v1/namespaces",
+        json={"namespace": ["db"]},
+        headers={"Content-Type": "application/json"})
+
+    # Create table with primary key
+    response = requests.post(
+        f"{api_url}/v1/namespaces/db/tables",
+        json=table_data,
+        headers={"Content-Type": "application/json"})

Review Comment:
   ![medium](https://www.gstatic.com/codereviewagent/medium-priority.svg)
   
   The `requests.post` call to create the table is missing a `timeout` 
parameter. Adding a timeout prevents the request from hanging indefinitely.
   
   ```python
       response = requests.post(
           f"{api_url}/v1/namespaces/db/tables",
           json=table_data,
           headers={"Content-Type": "application/json"},
           timeout=10)
   ```



##########
sdks/python/apache_beam/yaml/integration_tests.py:
##########
@@ -556,6 +561,73 @@ def temp_oracle_database():
     yield f"jdbc:oracle:thin:system/oracle@localhost:{port}/XEPDB1"
 
 
[email protected]
+def temp_iceberg_table_with_pk(table_data):
+
+  # Create a temp dir that will be shared between host and container.
+  # We use the exact same path on both to avoid path mapping issues.
+  # We create it in the current working directory (workspace) because
+  # Docker in GitHub Actions often cannot mount directories from /tmp.
+  temp_dir = tempfile.mkdtemp(dir=os.getcwd())
+  os.chmod(temp_dir, 0o777)
+
+  # Start the Iceberg REST catalog container
+  container = DockerContainer("tabulario/iceberg-rest:0.6.0")
+  container.with_exposed_ports(8181)
+  container.with_volume_mapping(temp_dir, temp_dir, mode='rw')
+  container.with_env("HADOOP_USER_NAME", "iceberg")
+  container.with_env("CATALOG_WAREHOUSE", temp_dir)
+  container.with_env(
+      "CATALOG_IO__IMPL", "org.apache.iceberg.hadoop.HadoopFileIO")
+
+  try:
+    container.start()
+
+    ip = container.get_container_host_ip()
+    port = container.get_exposed_port(8181)
+    api_url = f"http://{ip}:{port}";
+
+    # Poll the REST API until it is ready
+    for _ in range(30):
+      try:
+        response = requests.get(f"{api_url}/v1/config")
+        if response.status_code == 200:
+          break
+      except requests.exceptions.ConnectionError:
+        pass
+      time.sleep(1)
+    else:
+      raise RuntimeError("Iceberg REST catalog failed to start in time.")
+
+    # Create namespace 'db'
+    requests.post(
+        f"{api_url}/v1/namespaces",
+        json={"namespace": ["db"]},
+        headers={"Content-Type": "application/json"})

Review Comment:
   ![medium](https://www.gstatic.com/codereviewagent/medium-priority.svg)
   
   The `requests.post` call to create the namespace is missing a `timeout` 
parameter and does not check the response status code. If the namespace 
creation fails, the subsequent table creation will fail with a less clear 
error. Adding a timeout and calling `raise_for_status()` improves reliability 
and error reporting.
   
   ```suggestion
       response = requests.post(
           f"{api_url}/v1/namespaces",
           json={"namespace": ["db"]},
           headers={"Content-Type": "application/json"},
           timeout=10)
       response.raise_for_status()
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to