[GitHub] [iceberg] kbendick commented on issue #5515: "Table not found" error while using rewriteDataFiles

GitBox Mon, 15 Aug 2022 09:52:28 -0700


kbendick commented on issue #5515:
URL: https://github.com/apache/iceberg/issues/5515#issuecomment-1215361026

You mentioned that you can get the location and the `currentSnapshot`.

I think there isn't anything wrong with your table. If you look at the error
message, it says that one of the file set groups failed to be rewritten. This
_usually_ happens when there's a concurrent write or other operation on the
table that prevents the data rewrite from proceeding (if it would break ACID
compliance).

However, I do see the final error message shows that Table
`hive.wrk.my_table` not found.

When you use `spark.sql("show tables in hive")` or `spark.sql("show tables
in hive.wrk")`, are you able to see the table?

Given that you didn't set the `uri` property as found in the first example
here, https://iceberg.apache.org/docs/latest/spark-configuration/#catalogs, I
think you need to be sure that the table is registered as `hive.wrk.my_table`.

Additionally, I don't see in the code where you configured the spark
session. In order to use `SparkActions.get()`, you need to have an active
properly initialized Spark session. Otherwise, it defaults to the [current
active spark
session](https://github.com/apache/iceberg/blob/ce5128f09cc697455e76af08ce6ce3c9c5b08b70/spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/actions/SparkActions.java#L46-L48).

So I'd suggest:
1. Ensure that the program, as written, can see the table when you use
`spark.sql("SHOW TABLES IN ....")`.
2. Consider passing the URI directly to the catalog properties when
configuring it.
3. Make sure that `wrk` is really the namespace, and that the table isn't
_named_ `wrk.my_table`.

If you provide the way you configured the program and submitted it, as well
as check the output of `spark.sql("SHOW TABLES IN ....")` and then run
`DESCRIBE EXTENDED TABLE ...` on that table, that should help.

I think it's likely that you just need to properly initialize the Spark
session (in order to use the SparkActions provider), aka
`SparkSession.builder().....getOrCreate()` properly for the `spark` object.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] kbendick commented on issue #5515: "Table not found" error while using rewriteDataFiles

Reply via email to