[GitHub] [iceberg] Shubham-Jha-GT opened a new issue, #5369: Unable to query Iceberg table from PySpark script in AWS Glue

GitBox Wed, 27 Jul 2022 07:57:44 -0700


Shubham-Jha-GT opened a new issue, #5369:
URL: https://github.com/apache/iceberg/issues/5369


   I'm trying to read data from an iceberg table, the data is in ORC format and 
partitioned by column. I'm getting this error - 
   
   > `AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: 
Unable to fetch table temp_tag_thrshld_iceberg. StorageDescriptor#InputFormat 
cannot be null for table: temp_tag_thrshld_iceberg (Service: null; Status Code: 
0; Error Code: null; Request ID: null; Proxy: null)`
   
   This is my code : 
   `spark = SparkSession.builder.config("spark.driver.memory", 
"25g").appName(app_name).getOrCreate()`
   `temp_tag_thrshld_data = spark.sql("SELECT * FROM 
dev_db.temp_tag_thrshld_iceberg")`
   
   If I replace my `spark.sql("Select * from a_normal_athena_table)` the code 
runs fine. I'm also not able to read the data directly from  S3 as its an ORC 
format with Snappy compression so I don't get any results (I'm probably missing 
the correct framework to read S3 ORC directly but that's another issue for 
another day)
   
   I've tried validating my table using 
   `aws glue get-table --database-name dev_db --name temp_tag_thrshld_iceberg`
   
   and this is the output I got - 
   
   > > {
   >     "Table": {
   >         "Name": "temp_tag_thrshld_iceberg",
   >         "DatabaseName": "dev_db",
   >         "CreateTime": 1658864256.0,
   >         "UpdateTime": 1658864347.0,
   >         "Retention": 0,
   >         "StorageDescriptor": {
   >             "Columns": [
   >                 {
   >                     "Name": "tag",
   >                     "Type": "int",
   >                     "Parameters": {
   >                         "iceberg.field.current": "true",
   >                         "iceberg.field.id": "1",
   >                         "iceberg.field.optional": "true"
   >                     }
   >                 },
   >                 {
   >                     "Name": "zipcode",
   >                     "Type": "int",
   >                     "Parameters": {
   >                         "iceberg.field.current": "true",
   >                         "iceberg.field.id": "2",
   >                         "iceberg.field.optional": "true"
   >                     }
   >                 },
   >                 {
   >                     "Name": "threshold_max",
   >                     "Type": "double",
   >                     "Parameters": {
   >                         "iceberg.field.current": "true",
   >                         "iceberg.field.id": "3",
   >                         "iceberg.field.optional": "true"
   >                     }
   >                 },
   >                 {
   >                     "Name": "level",
   >                     "Type": "string",
   >                     "Parameters": {
   >                         "iceberg.field.current": "true",
   >                         "iceberg.field.id": "4",
   >                         "iceberg.field.optional": "true"
   >                     }
   >                 }
   >             ],
   >             "Location": 
"s3://dev_db/athena-tables/temp_tag_thrshld_iceberg",
   >             "Compressed": false,
   >             "NumberOfBuckets": 0,
   >             "SortColumns": [],
   >             "StoredAsSubDirectories": false
   >         },
   >         "TableType": "EXTERNAL_TABLE",
   >         "Parameters": {
   >             "metadata_location": 
"s3://dev_db/athena-tables/temp_tag_thrshld_iceberg/metadata/00001-0ee5fbc7-044e-439d-aa1e-d76935002ebd.metadata.json",
   >             "previous_metadata_location": 
"s3://dev_db/athena-tables/temp_tag_thrshld_iceberg/metadata/00000-3a8f33f0-fbef-48c3-b289-6021f62b8b8c.metadata.json",
   >             "table_type": "ICEBERG"
   >         },
   >         "CreatedBy": "IAM Details",
   >         "IsRegisteredWithLakeFormation": false,
   >         "CatalogId": "571708111280",
   >         "VersionId": "1"
   >     }
   > }
   > 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] Shubham-Jha-GT opened a new issue, #5369: Unable to query Iceberg table from PySpark script in AWS Glue

Reply via email to