[GitHub] [iceberg] can-sun opened a new issue, #5768: Spark cannot skip glue table name validation when creating ICEBERG tabel

GitBox Thu, 15 Sep 2022 01:48:01 -0700


can-sun opened a new issue, #5768:
URL: https://github.com/apache/iceberg/issues/5768


   ### Apache Iceberg version
   
   0.13.0
   
   ### Query engine
   
   Spark
   
   ### Please describe the bug 🐞
   
   iceberg-spark3-runtime version: 0.13.0
   
   I attempted to skip the glut table name validation by setting 
**glue.skip-name-validation** to true. None of the following spark sql was 
successful.
   
   iceberg catalog properties:
   
   ```
   spark-shell --packages $DEPENDENCIES \
       --conf 
spark.sql.catalog.my_catalog=org.apache.iceberg.spark.SparkCatalog \
       --conf spark.sql.catalog.my_catalog.glue.skip-name-validation=true \
       --conf spark.sql.catalog.my_catalog.warehouse=<s3-placeholder>\
       --conf 
spark.sql.catalog.my_catalog.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog
 \
       --conf 
spark.sql.catalog.my_catalog.io-impl=org.apache.iceberg.aws.s3.S3FileIO
   ```
   
   spark sql queries tried so far
   
   ```
   spark.sql("""CREATE TABLE IF NOT EXISTS my_catalog.db.`iceberg-table` ( id 
string, 
   creation_date string, 
   last_update_time string) 
   LOCATION '<my-s3-bucket>' 
   TBLPROPERTIES ('table_type'='ICEBERG', 'format'='parquet', 
'glue.skip-name-validation'=true) """)
   
   spark.sql("""CREATE TABLE IF NOT EXISTS my_catalog.db.`iceberg-table` (id 
string,
   creation_date string,
   last_update_time string)
   USING iceberg
   OPTIONS ( 'glue.skip-name-validation'=true )
   LOCATION '<my-s3-bucket>'' """)
   ```
   
   Error Stack trace:
   ```
   java.lang.IllegalArgumentException: Invalid table identifier: 
db.iceberg-table
     at 
org.apache.iceberg.relocated.com.google.common.base.Preconditions.checkArgument(Preconditions.java:217)
     at 
org.apache.iceberg.BaseMetastoreCatalog$BaseMetastoreCatalogTableBuilder.<init>(BaseMetastoreCatalog.java:115)
     at 
org.apache.iceberg.BaseMetastoreCatalog.buildTable(BaseMetastoreCatalog.java:68)
     at org.apache.iceberg.spark.SparkCatalog.newBuilder(SparkCatalog.java:578)
     at org.apache.iceberg.spark.SparkCatalog.createTable(SparkCatalog.java:148)
     at org.apache.iceberg.spark.SparkCatalog.createTable(SparkCatalog.java:92)
   ```
   
   Besides I also tried to write data to the glue table, also failed. The 
ICEBERG table cannot be created via spark, however I can create such a table by 
using Athena query.
   
   ```
   df.writeTo("my_catalog.db.`iceberg-table`").append()
   ```
   
   Got table nor found
   
   ```
   org.apache.spark.sql.AnalysisException: Table or view not found: 
my_catalog.db.`iceberg-table`;
   'AppendData 'UnresolvedRelation [my_catalog, db, iceberg-table], [], false, 
true
   +- Project [_1#3 AS id#10, _2#4 AS creation_date#11, _3#5 AS 
last_update_time#12]
      +- LocalRelation [_1#3, _2#4, _3#5]
   
     at 
org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
     at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:134)
     at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:94)
     at 
org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:302)
     at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:94)
     at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:91)
     at 
org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:172)
     at 
org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:195)
     at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:330)
     at 
org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:192)
     at 
org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:90)
     at 
org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:192)
     at 
org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:224)
     at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
     at 
org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:224)
     at 
org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:90)
     at 
org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:88)
     at 
org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:95)
     at 
org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:93)
     at 
org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:136)
     at 
org.apache.spark.sql.DataFrameWriterV2.runCommand(DataFrameWriterV2.scala:194)
     at 
org.apache.spark.sql.DataFrameWriterV2.append(DataFrameWriterV2.scala:148)
   ```
   
   I know I am not following the Glue/Athena best practices here: 
https://docs.aws.amazon.com/athena/latest/ug/glue-best-practices.html, however 
for purpose of backwards compatibility, I am still trying to figure out if it 
is viable to use dashes in ICEBERG table name.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] can-sun opened a new issue, #5768: Spark cannot skip glue table name validation when creating ICEBERG tabel

Reply via email to