dthauvin opened a new issue, #7317:
URL: https://github.com/apache/iceberg/issues/7317
### Apache Iceberg version
1.1.0
### Query engine
EMR
### Please describe the bug 🐞
Hello
i'am currently using SPARK on emr-6.10.0 version witch embedded Iceberg
1.1.0-amzn-0.
I tried to running the migrate procedure to replace an existing table with
and iceberg table but i get the following error :
code :
```SQL
spark.sql("""
CALL glue.system.migrate('spark_catalog.nyc.taxis_origin')
""")
```
stack trace:
```java
An error was encountered:
Cannot migrate a table from a non-Iceberg Spark Session Catalog. Found
spark_catalog of class
org.apache.spark.sql.execution.datasources.v2.V2SessionCatalog as the source
catalog.
Traceback (most recent call last):
File
"/mnt/yarn/usercache/livy/appcache/application_1681132811497_0003/container_1681132811497_0003_01_000001/pyspark.zip/pyspark/sql/session.py",
line 1034, in sql
return DataFrame(self._jsparkSession.sql(sqlQuery), self)
File
"/mnt/yarn/usercache/livy/appcache/application_1681132811497_0003/container_1681132811497_0003_01_000001/py4j-0.10.9.5-src.zip/py4j/java_gateway.py",
line 1322, in __call__
answer, self.gateway_client, self.target_id, self.name)
File
"/mnt/yarn/usercache/livy/appcache/application_1681132811497_0003/container_1681132811497_0003_01_000001/pyspark.zip/pyspark/sql/utils.py",
line 196, in deco
raise converted from None
pyspark.sql.utils.IllegalArgumentException: Cannot migrate a table from a
non-Iceberg Spark Session Catalog. Found spark_catalog of class
org.apache.spark.sql.execution.datasources.v2.V2SessionCatalog as the source
catalog.
```
My EMR configuration has these following parameters for spark-defaults
configuration :
```json
{
"spark.sql.catalog.glue": "org.apache.iceberg.spark.SparkCatalog"
"spark.sql.catalog.glue.warehouse": "s3://mybucket/warehouseiceberg/"
"spark.sql.catalog.glue.catalog-impl":
"org.apache.iceberg.aws.glue.GlueCatalog"
"spark.sql.catalog.glue.io-impl": "org.apache.iceberg.aws.s3.S3FileIO"
"spark.sql.catalog.glue.lock.table": "mytable-iceberg-lock"
"spark.sql.catalog.glue.lakeformation-enabled": "false"
"spark.sql.extensions":
"org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions"
}
```
and for iceberg emr configuration :
```json
{
"Classification":"iceberg-defaults",
"Properties":{"iceberg.enabled":"true"}
}
```
by the way the `add_files` method works well.
```
spark.sql("""
CREATE TABLE IF NOT EXISTS glue.nyc.taxis_migrated(
`VendorID` BIGINT,
`tpep_pickup_datetime` TIMESTAMP,
`tpep_dropoff_datetime` TIMESTAMP,
`passenger_count` DOUBLE,
`trip_distance` DOUBLE,
`RatecodeID` DOUBLE,
`store_and_fwd_flag` STRING,
`PULocationID` BIGINT,
`DOLocationID` BIGINT,
`payment_type` BIGINT,
`fare_amount` DOUBLE,
`extra` DOUBLE,
`mta_tax` DOUBLE,
`tip_amount` DOUBLE,
`tolls_amount` DOUBLE,
`improvement_surcharge` DOUBLE,
`total_amount` DOUBLE,
`congestion_surcharge` DOUBLE,
`airport_fee` DOUBLE)
USING iceberg
LOCATION 's3://mybucket/tripdata_migrated/'
TBLPROPERTIES(
'write.parquet.compression-codec'='snappy'
)
""")
spark.sql("""CALL glue.system.add_files(
table => 'nyc.taxis_migrated',
source_table => 'spark_catalog.nyc.taxis_origin'
)""")
DataFrame[added_files_count: bigint]
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]