[GitHub] [iceberg] dthauvin opened a new issue, #7317: EMR 6.10.0 Cannot migrate a table from a non-Iceberg Spark Session Catalog. Found spark_catalog

via GitHub Mon, 10 Apr 2023 12:37:06 -0700


dthauvin opened a new issue, #7317:
URL: https://github.com/apache/iceberg/issues/7317


   ### Apache Iceberg version
   
   1.1.0
   
   ### Query engine
   
   EMR
   
   ### Please describe the bug 🐞
   
   Hello 
   
   i'am currently using SPARK on emr-6.10.0 version witch embedded Iceberg 
1.1.0-amzn-0.
   
   I tried to running the migrate procedure to replace an existing table with 
and iceberg table but i get the following error : 
   
   
   code : 
   
   ```SQL
   spark.sql("""
   CALL glue.system.migrate('spark_catalog.nyc.taxis_origin')
   """)
   ``` 
   
   stack trace: 
   
   ```java
   An error was encountered:
   Cannot migrate a table from a non-Iceberg Spark Session Catalog. Found 
spark_catalog of class 
org.apache.spark.sql.execution.datasources.v2.V2SessionCatalog as the source 
catalog.
   Traceback (most recent call last):
     File 
"/mnt/yarn/usercache/livy/appcache/application_1681132811497_0003/container_1681132811497_0003_01_000001/pyspark.zip/pyspark/sql/session.py",
 line 1034, in sql
       return DataFrame(self._jsparkSession.sql(sqlQuery), self)
     File 
"/mnt/yarn/usercache/livy/appcache/application_1681132811497_0003/container_1681132811497_0003_01_000001/py4j-0.10.9.5-src.zip/py4j/java_gateway.py",
 line 1322, in __call__
       answer, self.gateway_client, self.target_id, self.name)
     File 
"/mnt/yarn/usercache/livy/appcache/application_1681132811497_0003/container_1681132811497_0003_01_000001/pyspark.zip/pyspark/sql/utils.py",
 line 196, in deco
       raise converted from None
   pyspark.sql.utils.IllegalArgumentException: Cannot migrate a table from a 
non-Iceberg Spark Session Catalog. Found spark_catalog of class 
org.apache.spark.sql.execution.datasources.v2.V2SessionCatalog as the source 
catalog.
   
   ```
   
   My EMR configuration has these following parameters for spark-defaults 
configuration : 
   
   ```json
   {
       "spark.sql.catalog.glue": "org.apache.iceberg.spark.SparkCatalog"
       "spark.sql.catalog.glue.warehouse": "s3://mybucket/warehouseiceberg/"
       "spark.sql.catalog.glue.catalog-impl": 
"org.apache.iceberg.aws.glue.GlueCatalog"
       "spark.sql.catalog.glue.io-impl": "org.apache.iceberg.aws.s3.S3FileIO"
       "spark.sql.catalog.glue.lock.table": "mytable-iceberg-lock"
       "spark.sql.catalog.glue.lakeformation-enabled": "false"
       "spark.sql.extensions": 
"org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions"
     }
    
   ```
   
   and for iceberg emr configuration : 
   
   ```json
   {
       "Classification":"iceberg-defaults",
       "Properties":{"iceberg.enabled":"true"}
   }
   ```
   
   by the way the `add_files` method works well.
   
   ``` 
   spark.sql("""
   CREATE TABLE IF NOT EXISTS glue.nyc.taxis_migrated(
     `VendorID` BIGINT,
     `tpep_pickup_datetime` TIMESTAMP,
     `tpep_dropoff_datetime` TIMESTAMP,
     `passenger_count` DOUBLE,
     `trip_distance` DOUBLE,
     `RatecodeID` DOUBLE,
     `store_and_fwd_flag` STRING,
     `PULocationID` BIGINT,
     `DOLocationID` BIGINT,
     `payment_type` BIGINT,
     `fare_amount` DOUBLE,
     `extra` DOUBLE,
     `mta_tax` DOUBLE,
     `tip_amount` DOUBLE,
     `tolls_amount` DOUBLE,
     `improvement_surcharge` DOUBLE,
     `total_amount` DOUBLE,
     `congestion_surcharge` DOUBLE,
     `airport_fee` DOUBLE)
   USING iceberg
   LOCATION 's3://mybucket/tripdata_migrated/'
   TBLPROPERTIES(
     'write.parquet.compression-codec'='snappy'
   )
   """)
   
   
   spark.sql("""CALL glue.system.add_files(
     table => 'nyc.taxis_migrated',
     source_table => 'spark_catalog.nyc.taxis_origin'
   )""")
   
   DataFrame[added_files_count: bigint]
   
   ``` 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] dthauvin opened a new issue, #7317: EMR 6.10.0 Cannot migrate a table from a non-Iceberg Spark Session Catalog. Found spark_catalog

Reply via email to