[GitHub] [iceberg] ignaski opened a new issue, #8265: Default table properties not respected when using Spark DataFrame API

via GitHub Tue, 08 Aug 2023 23:43:13 -0700


ignaski opened a new issue, #8265:
URL: https://github.com/apache/iceberg/issues/8265


   ### Apache Iceberg version
   
   1.3.1 (latest release)
   
   ### Query engine
   
   Spark
   
   ### Please describe the bug 🐞
   
   Creating a table via Spark SQL respects the default table properties, 
however it does not work via DataFrame API. 
   The issue can be reproduced using quickstart example. 
   ```
   spark-shell --packages 
org.apache.iceberg:iceberg-spark-runtime-3.4_2.12:1.3.1\
       --conf 
spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
 \
       --conf spark.sql.catalog.local=org.apache.iceberg.spark.SparkCatalog \
       --conf 
spark.sql.catalog.local.catalog-impl=org.apache.iceberg.rest.RESTCatalog \
       --conf spark.sql.catalog.local.uri=http://rest:8181 \
       --conf 
spark.sql.catalog.local.io-impl=org.apache.iceberg.aws.s3.S3FileIO \
       --conf spark.sql.catalog.local.warehouse=s3a://warehouse/wh/ \
       --conf spark.sql.catalog.local.s3.endpoint=http://minio:9000 \
       --conf spark.sql.defaultCatalog=local \
       --conf 
spark.sql.catalog.local.table-default.write.metadata.delete-after-commit.enabled=true
   ```
   
   Creation via Spark SQL:
   
   ```
   scala> spark.sql("CREATE TABLE local.nyc.taxis (vendor_id bigint) 
PARTITIONED BY (vendor_id);")
   res0: org.apache.spark.sql.DataFrame = []
   
   scala> spark.sql("show create table local.nyc.taxis").show(truncate=false)
   
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   |createtab_stmt                                                              
                                                                                
                                                                                
                                                                   |
   
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   |CREATE TABLE local.nyc.taxis (\n  vendor_id BIGINT)\nUSING 
iceberg\nPARTITIONED BY (vendor_id)\nLOCATION 
's3://warehouse/nyc/taxis'\nTBLPROPERTIES (\n  'current-snapshot-id' = 
'none',\n  'format' = 'iceberg/parquet',\n  'format-version' = '1',\n  
'write.metadata.delete-after-commit.enabled' = 'true')\n|
   
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   ```
   
   Creation via DataFrame API
   ```
   scala> import org.apache.spark.sql.types._
   import org.apache.spark.sql.types._
   
   scala> import org.apache.spark.sql.Row
   import org.apache.spark.sql.Row
   
   scala> val schema = StructType( Array(
        |     StructField("vendor_id", LongType,true)
        | ))
   schema: org.apache.spark.sql.types.StructType = 
StructType(StructField(vendor_id,LongType,true))
   
   scala> val df = 
spark.createDataFrame(spark.sparkContext.emptyRDD[Row],schema)
   df: org.apache.spark.sql.DataFrame = [vendor_id: bigint]
   
   scala> df.writeTo("local.nyc.taxis_df").create()
   
   scala> spark.sql("show create table local.nyc.taxis_df").show(truncate=false)
   
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   |createtab_stmt                                                              
                                                                                
                                                                                
                                                      |
   
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   |CREATE TABLE local.nyc.taxis_df (\n  vendor_id BIGINT)\nUSING 
iceberg\nLOCATION 's3://warehouse/nyc/taxis_df'\nTBLPROPERTIES (\n  
'created-at' = '2023-08-09T06:41:27.531135128Z',\n  'current-snapshot-id' = 
'6638980767440031836',\n  'format' = 'iceberg/parquet',\n  'format-version' = 
'1')\n|
   
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] ignaski opened a new issue, #8265: Default table properties not respected when using Spark DataFrame API

Reply via email to