RussellSpitzer commented on issue #1533: URL: https://github.com/apache/iceberg/issues/1533#issuecomment-702198282
I believe this is caused by an incompatibility between the hive client being used and the hive metastore being used. The Inner error is Caused by: InvalidObjectException(message:db) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_with_environment_context_result$create_table_with_environment_context_resultStandardScheme.read(ThriftHiveMetastore.java:42216) Which should be thrown when the thrift api makes a call and is unable to work with the results. My guess would be that the hive client being included at runtime within Spark is not compatible with the metastore you are running. I know there are several incompatibilities between certain metastore versions and certain versions of spark. I couldn't find any tickets for a 2.7x metastore in the Spark jira but I did see support for 2.3, 2.4 and 3.0 and 3.1. So it may be related to that? What version of the metastore are you using? On Wed, Sep 30, 2020 at 9:17 AM Vijay Akkineni <[email protected]> wrote: > Hi, > > I am new to iceberg and trying to follow get started section. But running into an error while creating a table using iceberg. Below is my environment setup. Any help is appreciated. > > Hive - 2.7.3 using Postgres as metastore. > Spark - 3.0.1 with hadoop 2.7.4 with S3A > spark.sql.warehouse.dir ('s3a://testbucket') > > ./spark-sql > --conf > spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog > > --conf spark.sql.catalog.spark_catalog.type=hive > --conf spark.sql.catalog.local=org.apache.iceberg.spark.SparkCatalog > --conf spark.sql.catalog.local.type=hive > --conf spark.sql.catalog.local.uri=thrift://192.168.86.240:9083 > > SQL command: CREATE TABLE local.db.sample (id bigint COMMENT 'unique id', > data string, category string) USING iceberg PARTITIONED BY (category); > > I do see a directory getting created in s3 bucket > testbucket/db.db/sample/metadata, but fails with the below error. > > The exception while creating table is > 20/09/30 10:01:00 ERROR SparkSQLDriver: Failed in [CREATE TABLE > local.db.sample (id bigint COMMENT 'unique id', data string,category > string) USING iceberg PARTITIONED BY (category)] > java.lang.RuntimeException: Metastore operation failed for db.sample > at > org.apache.iceberg.hive.HiveTableOperations.doCommit(HiveTableOperations.java:200) > at > org.apache.iceberg.BaseMetastoreTableOperations.commit(BaseMetastoreTableOperations.java:103) > at > org.apache.iceberg.BaseMetastoreCatalog.createTable(BaseMetastoreCatalog.java:71) > at > org.apache.iceberg.CachingCatalog.lambda$createTable$0(CachingCatalog.java:75) > at > org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.BoundedLocalCache.lambda$doComputeIfAbsent$14(BoundedLocalCache.java:2337) > at > java.base/java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1908) > at > org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.BoundedLocalCache.doComputeIfAbsent(BoundedLocalCache.java:2335) > at > org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.BoundedLocalCache.computeIfAbsent(BoundedLocalCache.java:2318) > at > org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.LocalCache.computeIfAbsent(LocalCache.java:111) > at > org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.LocalManualCache.get(LocalManualCache.java:54) > at org.apache.iceberg.CachingCatalog.createTable(CachingCatalog.java:73) > at org.apache.iceberg.spark.SparkCatalog.createTable(SparkCatalog.java:136) > at org.apache.iceberg.spark.SparkCatalog.createTable(SparkCatalog.java:78) > at > org.apache.spark.sql.execution.datasources.v2.CreateTableExec.run(CreateTableExec.scala:41) > at > org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:39) > at > org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:39) > at > org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:45) > at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:229) > at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3618) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:100) > at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:160) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:87) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) > at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3616) > at org.apache.spark.sql.Dataset.(Dataset.scala:229) > at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:100) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764) > at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97) > at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:607) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764) > at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:602) > at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:650) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:63) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:377) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1(SparkSQLCLIDriver.scala:496) > at scala.collection.Iterator.foreach(Iterator.scala:941) > at scala.collection.Iterator.foreach$(Iterator.scala:941) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1429) > at scala.collection.IterableLike.foreach(IterableLike.scala:74) > at scala.collection.IterableLike.foreach$(IterableLike.scala:73) > at scala.collection.AbstractIterable.foreach(Iterable.scala:56) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processLine(SparkSQLCLIDriver.scala:490) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:282) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) > at > org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) > at org.apache.spark.deploy.SparkSubmit.org > $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:928) > at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: InvalidObjectException(message:db) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_with_environment_context_result$create_table_with_environment_context_resultStandardScheme.read(ThriftHiveMetastore.java:42216) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_with_environment_context_result$create_table_with_environment_context_resultStandardScheme.read(ThriftHiveMetastore.java:42193) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_with_environment_context_result.read(ThriftHiveMetastore.java:42119) > at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:88) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_create_table_with_environment_context(ThriftHiveMetastore.java:1203) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.create_table_with_environment_context(ThriftHiveMetastore.java:1189) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.create_table_with_environment_context(HiveMetaStoreClient.java:2405) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:752) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:740) > at > org.apache.iceberg.hive.HiveTableOperations.lambda$doCommit$3(HiveTableOperations.java:185) > at org.apache.iceberg.hive.ClientPool.run(ClientPool.java:54) > at > org.apache.iceberg.hive.HiveTableOperations.doCommit(HiveTableOperations.java:184) > ... 56 more > java.lang.RuntimeException: Metastore operation failed for db.sample > at > org.apache.iceberg.hive.HiveTableOperations.doCommit(HiveTableOperations.java:200) > at > org.apache.iceberg.BaseMetastoreTableOperations.commit(BaseMetastoreTableOperations.java:103) > at > org.apache.iceberg.BaseMetastoreCatalog.createTable(BaseMetastoreCatalog.java:71) > at > org.apache.iceberg.CachingCatalog.lambda$createTable$0(CachingCatalog.java:75) > at > org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.BoundedLocalCache.lambda$doComputeIfAbsent$14(BoundedLocalCache.java:2337) > at > java.base/java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1908) > at > org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.BoundedLocalCache.doComputeIfAbsent(BoundedLocalCache.java:2335) > at > org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.BoundedLocalCache.computeIfAbsent(BoundedLocalCache.java:2318) > at > org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.LocalCache.computeIfAbsent(LocalCache.java:111) > at > org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.LocalManualCache.get(LocalManualCache.java:54) > at org.apache.iceberg.CachingCatalog.createTable(CachingCatalog.java:73) > at org.apache.iceberg.spark.SparkCatalog.createTable(SparkCatalog.java:136) > at org.apache.iceberg.spark.SparkCatalog.createTable(SparkCatalog.java:78) > at > org.apache.spark.sql.execution.datasources.v2.CreateTableExec.run(CreateTableExec.scala:41) > at > org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:39) > at > org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:39) > at > org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:45) > at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:229) > at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3618) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:100) > at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:160) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:87) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) > at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3616) > at org.apache.spark.sql.Dataset.(Dataset.scala:229) > at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:100) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764) > at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97) > at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:607) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764) > at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:602) > at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:650) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:63) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:377) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1(SparkSQLCLIDriver.scala:496) > at scala.collection.Iterator.foreach(Iterator.scala:941) > at scala.collection.Iterator.foreach$(Iterator.scala:941) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1429) > at scala.collection.IterableLike.foreach(IterableLike.scala:74) > at scala.collection.IterableLike.foreach$(IterableLike.scala:73) > at scala.collection.AbstractIterable.foreach(Iterable.scala:56) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processLine(SparkSQLCLIDriver.scala:490) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:282) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) > at > org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) > at org.apache.spark.deploy.SparkSubmit.org > $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:928) > at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: InvalidObjectException(message:db) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_with_environment_context_result$create_table_with_environment_context_resultStandardScheme.read(ThriftHiveMetastore.java:42216) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_with_environment_context_result$create_table_with_environment_context_resultStandardScheme.read(ThriftHiveMetastore.java:42193) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_with_environment_context_result.read(ThriftHiveMetastore.java:42119) > at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:88) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_create_table_with_environment_context(ThriftHiveMetastore.java:1203) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.create_table_with_environment_context(ThriftHiveMetastore.java:1189) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.create_table_with_environment_context(HiveMetaStoreClient.java:2405) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:752) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:740) > at > org.apache.iceberg.hive.HiveTableOperations.lambda$doCommit$3(HiveTableOperations.java:185) > at org.apache.iceberg.hive.ClientPool.run(ClientPool.java:54) > at > org.apache.iceberg.hive.HiveTableOperations.doCommit(HiveTableOperations.java:184) > ... 56 more > > — > You are receiving this because you are subscribed to this thread. > Reply to this email directly, view it on GitHub > <https://github.com/apache/iceberg/issues/1533>, or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AADE2YNS4FE3KGUV5RSUN6DSIM4Y7ANCNFSM4R7E322A> > . > ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
