davidshtian commented on issue #1449:
URL: 
https://github.com/apache/iceberg-python/issues/1449#issuecomment-2899736528

   > Hi [@davidshtian](https://github.com/davidshtian), I'm facing a similar 
issue, even tried via spark, but I'm getting a similar issue. Also, found this 
interesting post, you can use logging to debug 
https://dev.to/aws-builders/glue-iceberg-rest-api-and-pyiceberg-364g In my case 
the issue is
   > 
   > ```
   > warnings.warn(f"No preferred file implementation for scheme: 
{parsed_url.scheme}")
   > 2025-05-21 10:52:28,157 - pyiceberg.io - INFO - Defaulting to PyArrow 
FileIO
   > ```
   > 
   > And I think it has something to do with nested catalogs.
   
   Thanks for your advice. I've tried direct API call using _awscurl_.
   ```
   awscurl --service glue 
https://glue.us-east-1.amazonaws.com/iceberg/v1/catalogs/<account id>:<catalog 
name>/dev/namespaces/public/tables/<table name> | jq
   ```
   
   and get result like this:
   ```
   {
     "config": {
       "aws.server-side-capabilities.scan-planning": "true",
       "aws.glue.staging.data-transfer-role-arn": "xxx",
       "aws.glue.staging.location": 
"s3://redshift-staging-bucket-xxx/xxx:xxx/write/xxx/",
       "aws.glue.staging.expiration-ms": "1747882835000",
       "aws.glue.staging.session-token": "xxx",
       "aws.glue.staging.access-key-id": "xxx",
       "aws.glue.staging.secret-access-key": "xxx",
       "aws.server-side-capabilities.data-commit": "true"
     },
     "metadata": {
       "current-schema-id": 0,
       "current-snapshot-id": 0,
       "default-sort-order-id": 0,
       "default-spec-id": 0,
       "format-version": 2,
       "last-column-id": 2,
       "last-partition-id": 1000,
       "last-sequence-number": 0,
       "last-updated-ms": 0,
       "location": "tbl",
       "metadata-log": [],
       "partition-specs": [
         {
           "fields": [],
           "spec-id": 0
         }
       ],
       "properties": {
         "aws.write.format": "RMS",
         "schema.name-mapping.default": "[ {\n  \"field-id\" : 1,\n  \"names\" 
: [ \"id\" ]\n}, {\n  \"field-id\" : 2,\n  \"names\" : [ \"name\" ]\n} ]"
       },
       "refs": {
         "main": {
           "snapshot-id": 0,
           "type": "branch"
         }
       },
       "schemas": [
         {
           "fields": [
             {
               "id": 1,
               "name": "id",
               "required": false,
               "type": "int"
             },
             {
               "id": 2,
               "name": "name",
               "required": false,
               "type": "string"
             }
           ],
           "schema-id": 0,
           "type": "struct"
         }
       ],
       "snapshot-log": [
         {
           "snapshot-id": 0,
           "timestamp-ms": 0
         }
       ],
       "snapshots": [
         {
           "manifest-list": "<table name>",
           "parent-snapshot-id": 0,
           "schema-id": 0,
           "sequence-number": 0,
           "snapshot-id": 0,
           "summary": {},
           "timestamp-ms": 0
         }
       ],
       "sort-orders": [
         {
           "fields": [],
           "order-id": 0
         }
       ],
       "statistics-files": [],
       "table-uuid": "xxx"
     }
   }
   ```
   
   Weird for `"snapshots"` part in metadata json, `manifest-list` is just the 
table name, `schema-id` is always 0...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to