prithvi514 opened a new issue #3828:
URL: https://github.com/apache/iceberg/issues/3828


   Unable to create basic `iceberg` table(stored on S3) with Spark
   
   ```
   spark.sql("CREATE TABLE table32 (id bigint, data string) USING iceberg")
   
   ---------------------------------------------------------------------------
   Py4JJavaError                             Traceback (most recent call last)
   /tmp/ipykernel_15/955019638.py in <module>
   ----> 1 spark.sql("CREATE TABLE table32 (id bigint, data string) USING 
iceberg")
   
   /opt/spark/python/pyspark/sql/session.py in sql(self, sqlQuery)
       721         [Row(f1=1, f2='row1'), Row(f1=2, f2='row2'), Row(f1=3, 
f2='row3')]
       722         """
   --> 723         return DataFrame(self._jsparkSession.sql(sqlQuery), 
self._wrapped)
       724 
       725     def table(self, tableName):
   
   /usr/local/lib/python3.9/dist-packages/py4j/java_gateway.py in 
__call__(self, *args)
      1302 
      1303         answer = self.gateway_client.send_command(command)
   -> 1304         return_value = get_return_value(
      1305             answer, self.gateway_client, self.target_id, self.name)
      1306 
   
   /opt/spark/python/pyspark/sql/utils.py in deco(*a, **kw)
       109     def deco(*a, **kw):
       110         try:
   --> 111             return f(*a, **kw)
       112         except py4j.protocol.Py4JJavaError as e:
       113             converted = convert_exception(e.java_exception)
   
   /usr/local/lib/python3.9/dist-packages/py4j/protocol.py in 
get_return_value(answer, gateway_client, target_id, name)
       324             value = OUTPUT_CONVERTER[type](answer[2:], 
gateway_client)
       325             if answer[1] == REFERENCE_TYPE:
   --> 326                 raise Py4JJavaError(
       327                     "An error occurred while calling {0}{1}{2}.\n".
       328                     format(target_id, ".", name), value)
   
   Py4JJavaError: An error occurred while calling o224.sql.
   : java.lang.NoSuchMethodError: 
org.apache.hadoop.util.SemaphoredDelegatingExecutor.<init>(Ljava/util/concurrent/ExecutorService;IZ)V
        at org.apache.hadoop.fs.s3a.S3AFileSystem.create(S3AFileSystem.java:824)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1118)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1098)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:987)
        at 
org.apache.iceberg.hadoop.HadoopOutputFile.createOrOverwrite(HadoopOutputFile.java:85)
        at 
org.apache.iceberg.TableMetadataParser.internalWrite(TableMetadataParser.java:119)
        at 
org.apache.iceberg.TableMetadataParser.overwrite(TableMetadataParser.java:109)
        at 
org.apache.iceberg.BaseMetastoreTableOperations.writeNewMetadata(BaseMetastoreTableOperations.java:154)
        at 
org.apache.iceberg.hive.HiveTableOperations.doCommit(HiveTableOperations.java:206)
        at 
org.apache.iceberg.BaseMetastoreTableOperations.commit(BaseMetastoreTableOperations.java:126)
        at 
org.apache.iceberg.BaseMetastoreCatalog$BaseMetastoreCatalogTableBuilder.create(BaseMetastoreCatalog.java:216)
        at 
org.apache.iceberg.CachingCatalog$CachingTableBuilder.lambda$create$0(CachingCatalog.java:212)
        at 
org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.BoundedLocalCache.lambda$doComputeIfAbsent$14(BoundedLocalCache.java:2344)
        at 
java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1853)
        at 
org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.BoundedLocalCache.doComputeIfAbsent(BoundedLocalCache.java:2342)
        at 
org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.BoundedLocalCache.computeIfAbsent(BoundedLocalCache.java:2325)
        at 
org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.LocalCache.computeIfAbsent(LocalCache.java:108)
        at 
org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.LocalManualCache.get(LocalManualCache.java:62)
        at 
org.apache.iceberg.CachingCatalog$CachingTableBuilder.create(CachingCatalog.java:210)
        at 
org.apache.iceberg.spark.SparkCatalog.createTable(SparkCatalog.java:139)
        at 
org.apache.iceberg.spark.SparkCatalog.createTable(SparkCatalog.java:81)
        at 
org.apache.iceberg.spark.SparkSessionCatalog.createTable(SparkSessionCatalog.java:130)
        at 
org.apache.spark.sql.execution.datasources.v2.CreateTableExec.run(CreateTableExec.scala:41)
        at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:40)
        at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:40)
        at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:46)
        at 
org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:228)
        at 
org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3687)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
        at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
        at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
        at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3685)
        at org.apache.spark.sql.Dataset.<init>(Dataset.scala:228)
        at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
        at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96)
        at 
org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:618)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
        at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:613)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Thread.java:748)
   ```
   
   I have no issues creating a simple hive table. Can see `student5` folder 
created in my S3 bucket after this command
   ```
   spark.sql("CREATE TABLE student5 (id INT, name STRING, age INT)")
   
   WARN ResolveSessionCatalog: A Hive serde table will be created as there is 
no table provider specified. You can set 
spark.sql.legacy.createHiveTableByDefault to false so that native data source 
table will be created instead.
   DataFrame[]
   ```
   
   Setup details
   1. Spark on kubernetes
   2. Remote hive metastore service (mysql backend)
   3. Spark pods connect to metastore service via thrift
   4. Versions
      - Spark: 3.1.2
      - Hadoop: 3.2.2
      - Hive Standalone Metastore: 3.0.0
      - aws-java-sdk-bundle-1.11.563.jar
      - hadoop-aws-3.2.2.jar
      - guava jar in hive metastore updated to guava-27.0-jre.jar


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to