fischcheng opened a new issue, #1512:
URL: https://github.com/apache/polaris/issues/1512

   ### Describe the bug
   
   ### Description:
   When attempting to load a Polaris catalog using PyIceberg, the call to the 
GET /api/catalog/v1/config?warehouse=<warehouse_path> endpoint fails with an 
HTTP 404 error. Server logs indicate the reason is "Unable to find warehouse 
<warehouse_path>". However, querying the Management API (GET 
/api/management/v1/catalogs/{catalog_name}) confirms that a catalog does exist 
with the exact matching default-base-location property.
   
   Why would the Catalog API endpoint GET /api/catalog/v1/config fail to find 
the warehouse configuration (s3://<warehouse_path> ) when the Management API 
(GET /api/management/v1/catalogs/test_catalog) confirms that exact 
configuration exists? Is this a potential bug in the 0.9.0 version, or is there 
another configuration aspect or permission requirement for the /config endpoint 
that might be missing?
   
   ### To Reproduce
   
   
   ### Steps to Reproduce:
    
   1. Set up Docker Compose: Using a `docker-compose.yml` similar to the one 
below, based on getting-started/eclipselink/docker-compose-minimum.yaml. The 
key aspect is the polaris-setup service which creates the initial catalog.
   ```
   services:
   
     polaris:
       # IMPORTANT: the image MUST contain the Postgres JDBC driver and 
EclipseLink dependencies, see README for instructions
       image: apache/polaris:latest
       ports:
         # API port
         - "8181:8181"
         # Management port (metrics and health checks)
         - "8182:8182"
       environment:
         polaris.persistence.type: eclipse-link
         polaris.persistence.eclipselink.configuration-file: 
/deployments/config/eclipselink/persistence-minimum.xml
         polaris.realm-context.realms: POLARIS
         quarkus.otel.sdk.disabled: "true"
       volumes:
         - ../assets/eclipselink/:/deployments/config/eclipselink
       healthcheck:
         test: ["CMD", "curl", "http://localhost:8182/q/health";]
         interval: 2s
         timeout: 10s
         retries: 10
         start_period: 10s
   
   ```
   
   2. After spinning up, use polaris CLI to create a catalog:
   ```
   ./polaris \
     --client-id root \
     --client-secret s3cr3t \
     catalogs create \
     --storage-type S3 \
     --default-base-location "s3://my-lakehouse" \
     --role-arn "arn:aws:iam::role" \
     test_catalog
   ```
   
   3. docker-compose up 
   4. PyIcerberg code
   ```
   from pyiceberg.catalog import load_catalog
   # from pyiceberg.exceptions import NoSuchCatalogError # Or appropriate 
exception
   
   polaris_host = "<YOUR_POLARIS_EC2_DNS_OR_IP>" # Redacted host
   polaris_api_uri = f"http://{polaris_host}:8181/api/catalog"; # Correct prefix 
found via logs
   s3_warehouse_location = "s3://my-lakehouse-bucket" # Must match 
STORAGE_LOCATION used above
   polaris_client_id = "root"
   polaris_client_secret = "<REDACTED_SECRET>"
   s3_role_arn_to_assume = "arn:aws:iam::<AWS_ACCOUNT_ID>:role/docker-iam-role" 
# Redacted account ID
   aws_region = "us-east-1"
   
   catalog_properties = {
       "type": "rest",
       "uri": polaris_api_uri,
       "credential": f"{polaris_client_id}:{polaris_client_secret}",
       "warehouse": s3_warehouse_location,
       "py-io-impl": "pyiceberg.io.pyarrow.PyArrowFileIO",
       f"s3.assume-role.arn": s3_role_arn_to_assume,
       f"s3.assume-role.session-name": "pyiceberg-polaris-session",
       f"s3.region": aws_region,
   }
   catalog = load_catalog(
           name="polaris_catalog", # Logical name for this instance
           **catalog_properties
       )
   ```
   
   ### Actual Behavior
   
   The load_catalog call fails. The underlying HTTP request to GET 
/api/catalog/v1/config?warehouse=s3%3A%2F%2Fmy-lakehouse-bucket returns an HTTP 
404 error.
   
   ```
   [EL Fine]: sql: ... --SELECT ... FROM ENTITIES_ACTIVE WHERE ... NAME = ?)) 
bind => [..., s3://my-lakehouse-bucket]
   INFO  [org.apa.pol.ser.exc.IcebergExceptionMapper] ... Handling 
runtimeException Unable to find warehouse s3://my-lakehouse-bucket
   INFO  [io.qua.htt.access-log] ... "GET 
/api/catalog/v1/config?warehouse=s3%3A%2F%2Fmy-lakehouse-bucket HTTP/1.1" 404 
111
   ```
   
   ### Expected Behavior
   
   The load_catalog call should succeed, returning a valid catalog object. The 
underlying call to GET 
/api/catalog/v1/config?warehouse=s3://my-lakehouse-bucket should return HTTP 
200 OK with the catalog configuration.
   
   ### Additional context
   
   Querying the Management API confirms the catalog configuration seems correct:
   
   ```
   # Get Token (replace root:<REDACTED_SECRET>)
   ACCESS_TOKEN=$(curl -s -X POST -u "root:<REDACTED_SECRET>" -H "Content-Type: 
application/x-www-form-urlencoded" -d "grant_type=client_credentials" 
http://{HOST}:8181/api/catalog/v1/oauth/tokens | jq -r .access_token)
   
   # Query Management API for specific catalog (replace {HOST})
   curl -s -H "Authorization: Bearer ${ACCESS_TOKEN}" -H 'Accept: 
application/json' http://{HOST}:8181/api/management/v1/catalogs/test_catalog | 
jq .
   ```
   
   {
     "type": "INTERNAL",
     "name": "test_catalog",
     "properties": {
       "default-base-location": "s3://my-lakehouse-bucket" // <-- EXACT MATCH!
     },
     "createTimestamp": 1746209705140,
     "lastUpdateTimestamp": 1746209705140,
     "entityVersion": 1,
     "storageConfigInfo": {
       "roleArn": "arn:aws:iam::<AWS_ACCOUNT_ID>:role/docker-iam-role", // 
Redacted
       "externalId": null,
       "userArn": null,
       "region": null,
       "storageType": "S3",
       "allowedLocations": [
         "s3://my-lakehouse-bucket"
       ]
     }
   }
   
   This output clearly shows the default-base-location property is correctly 
set to s3://my-lakehouse-bucket for the test_catalog.
   
   Troubleshooting Steps Taken:
   
   - Verified API path prefix is /api/catalog/v1/ via server logs. Updated 
PyIceberg uri accordingly.
   - Verified authentication works (can get token, requests get past 401 when 
token is valid).
   - Verified server configuration using the Management API (output above 
confirms default-base-location matches).
   - Attempted deleting and recreating the catalog using curl against the 
Management API, ensuring the correct default-base-location was specified in the 
payload. The issue persists.
   - Verified Polaris basic startup is clean (no obvious errors in startup 
logs).
   
   
   ### System information
   
   Polaris Version: apache/polaris:latest (0.9.0)
   Deployment: Docker Compose using 
getting-started/eclipselink/docker-compose-minimum.yaml structure.
   Database: Postgres (implied by eclipselink setup, use an already setup RDS 
Postgres)
   Client: PyIceberg (0.9.0) using pyiceberg.catalog.load_catalog
   Python Version: 3.11
   Host OS: Polaris running on Ubuntu EC2 by docker-compose, PyIceberg running 
on OSX.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@polaris.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to