Shubham-Jha-GT opened a new issue, #5369:
URL: https://github.com/apache/iceberg/issues/5369
I'm trying to read data from an iceberg table, the data is in ORC format and
partitioned by column. I'm getting this error -
> `AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException:
Unable to fetch table temp_tag_thrshld_iceberg. StorageDescriptor#InputFormat
cannot be null for table: temp_tag_thrshld_iceberg (Service: null; Status Code:
0; Error Code: null; Request ID: null; Proxy: null)`
This is my code :
`spark = SparkSession.builder.config("spark.driver.memory",
"25g").appName(app_name).getOrCreate()`
`temp_tag_thrshld_data = spark.sql("SELECT * FROM
dev_db.temp_tag_thrshld_iceberg")`
If I replace my `spark.sql("Select * from a_normal_athena_table)` the code
runs fine. I'm also not able to read the data directly from S3 as its an ORC
format with Snappy compression so I don't get any results (I'm probably missing
the correct framework to read S3 ORC directly but that's another issue for
another day)
I've tried validating my table using
`aws glue get-table --database-name dev_db --name temp_tag_thrshld_iceberg`
and this is the output I got -
> > {
> "Table": {
> "Name": "temp_tag_thrshld_iceberg",
> "DatabaseName": "dev_db",
> "CreateTime": 1658864256.0,
> "UpdateTime": 1658864347.0,
> "Retention": 0,
> "StorageDescriptor": {
> "Columns": [
> {
> "Name": "tag",
> "Type": "int",
> "Parameters": {
> "iceberg.field.current": "true",
> "iceberg.field.id": "1",
> "iceberg.field.optional": "true"
> }
> },
> {
> "Name": "zipcode",
> "Type": "int",
> "Parameters": {
> "iceberg.field.current": "true",
> "iceberg.field.id": "2",
> "iceberg.field.optional": "true"
> }
> },
> {
> "Name": "threshold_max",
> "Type": "double",
> "Parameters": {
> "iceberg.field.current": "true",
> "iceberg.field.id": "3",
> "iceberg.field.optional": "true"
> }
> },
> {
> "Name": "level",
> "Type": "string",
> "Parameters": {
> "iceberg.field.current": "true",
> "iceberg.field.id": "4",
> "iceberg.field.optional": "true"
> }
> }
> ],
> "Location":
"s3://dev_db/athena-tables/temp_tag_thrshld_iceberg",
> "Compressed": false,
> "NumberOfBuckets": 0,
> "SortColumns": [],
> "StoredAsSubDirectories": false
> },
> "TableType": "EXTERNAL_TABLE",
> "Parameters": {
> "metadata_location":
"s3://dev_db/athena-tables/temp_tag_thrshld_iceberg/metadata/00001-0ee5fbc7-044e-439d-aa1e-d76935002ebd.metadata.json",
> "previous_metadata_location":
"s3://dev_db/athena-tables/temp_tag_thrshld_iceberg/metadata/00000-3a8f33f0-fbef-48c3-b289-6021f62b8b8c.metadata.json",
> "table_type": "ICEBERG"
> },
> "CreatedBy": "IAM Details",
> "IsRegisteredWithLakeFormation": false,
> "CatalogId": "571708111280",
> "VersionId": "1"
> }
> }
>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]