prithvi514 opened a new issue #3828:
URL: https://github.com/apache/iceberg/issues/3828
Unable to create basic `iceberg` table(stored on S3) with Spark
```
spark.sql("CREATE TABLE table32 (id bigint, data string) USING iceberg")
---------------------------------------------------------------------------
Py4JJavaError Traceback (most recent call last)
/tmp/ipykernel_15/955019638.py in <module>
----> 1 spark.sql("CREATE TABLE table32 (id bigint, data string) USING
iceberg")
/opt/spark/python/pyspark/sql/session.py in sql(self, sqlQuery)
721 [Row(f1=1, f2='row1'), Row(f1=2, f2='row2'), Row(f1=3,
f2='row3')]
722 """
--> 723 return DataFrame(self._jsparkSession.sql(sqlQuery),
self._wrapped)
724
725 def table(self, tableName):
/usr/local/lib/python3.9/dist-packages/py4j/java_gateway.py in
__call__(self, *args)
1302
1303 answer = self.gateway_client.send_command(command)
-> 1304 return_value = get_return_value(
1305 answer, self.gateway_client, self.target_id, self.name)
1306
/opt/spark/python/pyspark/sql/utils.py in deco(*a, **kw)
109 def deco(*a, **kw):
110 try:
--> 111 return f(*a, **kw)
112 except py4j.protocol.Py4JJavaError as e:
113 converted = convert_exception(e.java_exception)
/usr/local/lib/python3.9/dist-packages/py4j/protocol.py in
get_return_value(answer, gateway_client, target_id, name)
324 value = OUTPUT_CONVERTER[type](answer[2:],
gateway_client)
325 if answer[1] == REFERENCE_TYPE:
--> 326 raise Py4JJavaError(
327 "An error occurred while calling {0}{1}{2}.\n".
328 format(target_id, ".", name), value)
Py4JJavaError: An error occurred while calling o224.sql.
: java.lang.NoSuchMethodError:
org.apache.hadoop.util.SemaphoredDelegatingExecutor.<init>(Ljava/util/concurrent/ExecutorService;IZ)V
at org.apache.hadoop.fs.s3a.S3AFileSystem.create(S3AFileSystem.java:824)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1118)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1098)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:987)
at
org.apache.iceberg.hadoop.HadoopOutputFile.createOrOverwrite(HadoopOutputFile.java:85)
at
org.apache.iceberg.TableMetadataParser.internalWrite(TableMetadataParser.java:119)
at
org.apache.iceberg.TableMetadataParser.overwrite(TableMetadataParser.java:109)
at
org.apache.iceberg.BaseMetastoreTableOperations.writeNewMetadata(BaseMetastoreTableOperations.java:154)
at
org.apache.iceberg.hive.HiveTableOperations.doCommit(HiveTableOperations.java:206)
at
org.apache.iceberg.BaseMetastoreTableOperations.commit(BaseMetastoreTableOperations.java:126)
at
org.apache.iceberg.BaseMetastoreCatalog$BaseMetastoreCatalogTableBuilder.create(BaseMetastoreCatalog.java:216)
at
org.apache.iceberg.CachingCatalog$CachingTableBuilder.lambda$create$0(CachingCatalog.java:212)
at
org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.BoundedLocalCache.lambda$doComputeIfAbsent$14(BoundedLocalCache.java:2344)
at
java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1853)
at
org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.BoundedLocalCache.doComputeIfAbsent(BoundedLocalCache.java:2342)
at
org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.BoundedLocalCache.computeIfAbsent(BoundedLocalCache.java:2325)
at
org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.LocalCache.computeIfAbsent(LocalCache.java:108)
at
org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.LocalManualCache.get(LocalManualCache.java:62)
at
org.apache.iceberg.CachingCatalog$CachingTableBuilder.create(CachingCatalog.java:210)
at
org.apache.iceberg.spark.SparkCatalog.createTable(SparkCatalog.java:139)
at
org.apache.iceberg.spark.SparkCatalog.createTable(SparkCatalog.java:81)
at
org.apache.iceberg.spark.SparkSessionCatalog.createTable(SparkSessionCatalog.java:130)
at
org.apache.spark.sql.execution.datasources.v2.CreateTableExec.run(CreateTableExec.scala:41)
at
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:40)
at
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:40)
at
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:46)
at
org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:228)
at
org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3687)
at
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
at
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
at
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
at
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3685)
at org.apache.spark.sql.Dataset.<init>(Dataset.scala:228)
at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96)
at
org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:618)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:613)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
```
I have no issues creating a simple hive table. Can see `student5` folder
created in my S3 bucket after this command
```
spark.sql("CREATE TABLE student5 (id INT, name STRING, age INT)")
WARN ResolveSessionCatalog: A Hive serde table will be created as there is
no table provider specified. You can set
spark.sql.legacy.createHiveTableByDefault to false so that native data source
table will be created instead.
DataFrame[]
```
Setup details
1. Spark on kubernetes
2. Remote hive metastore service (mysql backend)
3. Spark pods connect to metastore service via thrift
4. Versions
- Spark: 3.1.2
- Hadoop: 3.2.2
- Hive Standalone Metastore: 3.0.0
- aws-java-sdk-bundle-1.11.563.jar
- hadoop-aws-3.2.2.jar
- guava jar in hive metastore updated to guava-27.0-jre.jar
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]