[GitHub] [iceberg] arunb2w opened a new issue, #5867: Facing error when creating iceberg table in EMR using Glue catalog

GitBox Tue, 27 Sep 2022 06:05:13 -0700


arunb2w opened a new issue, #5867:
URL: https://github.com/apache/iceberg/issues/5867


   ### Apache Iceberg version
   
   0.14.0
   
   ### Query engine
   
   EMR
   
   ### Please describe the bug 🐞
   
   Facing error when creating iceberg table in EMR using Glue catalog.
   spark version : 3.2.1
   iceberg version: 0.14.0
   
   **Sample code:**
   ```
   catalog = glue_dev
   warehouse_path = "s3_bucket"
   database = "test"
   table_name = "EPAYMENT"
   
   spark = SparkSession \
               .builder \
               .config(f'spark.sql.catalog.{catalog}', 
'org.apache.iceberg.spark.SparkCatalog') \
               .config(f'spark.sql.catalog.{catalog}.warehouse', 
f'{warehouse_path}') \
               .config(f'spark.sql.catalog.{catalog}.catalog-impl', 
'org.apache.iceberg.aws.glue.GlueCatalog') \
               .config(f'spark.sql.catalog.{catalog}.io-impl', 
'org.apache.iceberg.aws.s3.S3FileIO') \
               .config('spark.sql.extensions', 
'org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions') \
               .config('spark.sql.catalog.spark_catalog', 
'org.apache.iceberg.spark.SparkSessionCatalog') \
               .config('spark.sql.catalog.spark_catalog.type', 'hive') \
               .appName("IcebergDatalake") \
               .getOrCreate()
   
   df = spark_session.createDataFrame([
          ("100", "2015-01-01", "2015-01-01T13:51:39.340396Z"),
           ("101", "2015-01-01", "2015-01-01T12:14:58.597216Z"),
           ("102", "2015-01-01", "2015-01-01T13:51:40.417052Z"),
           ("103", "2015-01-01", "2015-01-01T13:51:40.519832Z")
       ], ["id", "creation_date", "last_update_time"])
       df.writeTo(f"{catalog}.{database}." + 
table_name).using("iceberg").create()
   ```
   
   **Spark command used to run:**
   `spark-submit --deploy-mode cluster--packages 
org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:0.14.0,software.amazon.awssdk:bundle:2.17.257,software.amazon.awssdk:url-connection-client:2.17.257
 --conf spark.yarn.submit.waitAppCompletion=true --conf 
"spark.executor.extraJavaOptions=-XX:+HeapDumpOnOutOfMemoryError 
-XX:HeapDumpPath=\"/opt/spark\"" --conf spark.dynamicAllocation.enabled=true 
--conf spark.executor.maxMemory=32g --conf 
spark.dynamicAllocation.executorIdleTimeout=300 --conf 
spark.shuffle.service.enabled=true --driver-memory 8g --num-executors 1 
--executor-memory 32g --executor-cores 5 iceberg_main.py`
   
   **Error stacktrace:**
   ```
   Traceback (most recent call last):
     File "iceberg_main.py", line 899, in <module>
       bootstrap_table(tableName, spark, write_type, is_local_run, 
hive_sync_enabled, database, catalog)
     File "iceberg_main.py", line 428, in bootstrap_table
       bootstrap_to_iceberg(table_name, write_type, spark_session, 
is_local_run, hive_sync_enabled, database, catalog, stacks)
     File "iceberg_main.py", line 407, in bootstrap_to_iceberg
       df.writeTo(f"{catalog}.{database}." + 
table_name).using("iceberg").create()
     File 
"/mnt/yarn/usercache/hadoop/appcache/application_1664278990474_0004/container_1664278990474_0004_01_000001/pyspark.zip/pyspark/sql/readwriter.py",
 line 1129, in create
     File 
"/mnt/yarn/usercache/hadoop/appcache/application_1664278990474_0004/container_1664278990474_0004_01_000001/py4j-0.10.9.3-src.zip/py4j/java_gateway.py",
 line 1322, in __call__
     File 
"/mnt/yarn/usercache/hadoop/appcache/application_1664278990474_0004/container_1664278990474_0004_01_000001/pyspark.zip/pyspark/sql/utils.py",
 line 117, in deco
   pyspark.sql.utils.IllegalArgumentException: Invalid table identifier: 
test.EPAYMENT
   
   ```
   
   Please provide insights on what am missing.
   The same code works fine, if i use hadoop catalog instead of Glue


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] arunb2w opened a new issue, #5867: Facing error when creating iceberg table in EMR using Glue catalog

Reply via email to