Re: [I] [Bug] kyuubi failed to read the iceberg of another cluster across domains [kyuubi]

via GitHub Thu, 16 Jan 2025 00:21:59 -0800


zxl-333 commented on issue #6890:
URL: https://github.com/apache/kyuubi/issues/6890#issuecomment-2594802403


   > Did you repeatedly configure spark_catalog_ky catalog? You should remove 
`spark.sql.catalog.spark_catalog_ky=org.apache.kyuubi.spark.connector.hive.HiveTableCatalog`
   > 
   > 
![Image](https://github.com/user-attachments/assets/ff5f3e01-2c09-44ff-9e73-415569f7b50e)
   
   When I remove the 
spark.sql.catalog.spark_catalog_ky=org.apache.kyuubi.spark.connector.hive.HiveTableCatalog
 Spark. The hive. HiveTableCatalog, throws connection metastore failed anomalies
   
   **first method -----------------------**
   **spark-defaults.conf**
   #---------------begin--------------
   spark.kerberos.access.hadoopFileSystems  hdfs://myns,hdfs://mynsbackup
   spark.sql.catalog.hive_catalog     
org.apache.kyuubi.spark.connector.hive.HiveTableCatalog
   spark.sql.catalog.hive_catalog.hive.metastore.uris     
thrift://bigdata-1734405115-lhalh:9083,thrift://bigdata-1734405115-0xt70:9083
   
#spark.sql.catalog.hive_catalog.hive.metastore.token.signature=thrift://bigdata-1734405115-lhalh:9083,thrift://bigdata-1734405115-0xt70:9083
   
   
   #local cluster metastore
   spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog
   #spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkCatalog
   spark.sql.catalog.spark_catalog.type=hive
   
spark.sql.catalog.spark_catalog.uri=thrift://bigdata-1734358521-u7gjy:9083,thrift://bigdata-1734358521-gsy9x:9083
   #配置下面是为了解决当前iceberg的uri和HiveConf.ConfVars.METASTOREURIS不相等问题
   #spark.sql.catalog.spark_catalog 
org.apache.kyuubi.spark.connector.hive.HiveTableCatalog
   #spark.sql.catalog.spark_catalog.hive.metastore.uris 
thrift://bigdata-1734358521-u7gjy:9083,thrift://bigdata-1734358521-gsy9x:9083
   
   #an other cluster metastore
   **spark.sql.catalog.spark_catalog_ky=org.apache.iceberg.spark.SparkCatalog**
   spark.sql.catalog.spark_catalog_ky.type=hive
   spark.sql.catalog.spark_catalog_ky.uri=thrift://bigdata-1734405115-lhalh:9083
   
   
**#spark.sql.catalog.spark_catalog_ky=org.apache.kyuubi.spark.connector.hive.HiveTableCatalog**
   
#spark.sql.catalog.spark_catalog_ky.hive.metastore.uris=thrift://bigdata-1734405115-lhalh:9083,thrift://bigdata-1734405115-0xt70:9083
   
#spark.sql.catalog.spark_catalog_ky.hive.metastore.kerberos.principal=hive/_h...@mr.0c20903122394b4293a44ead5cd1a27e.yun.cn
   #spark.sql.catalog.spark_catalog_ky.hive.metastore.sasl.enabled=true
   
#spark.sql.catalog.spark_catalog_ky.hive.metastore.token.signature=thrift://bigdata-1734405115-lhalh:9083,thrift://bigdata-1734405115-0xt70:9083
   #--------------end----------------
   
   **exception:**
   25/01/16 15:57:42 INFO metastore: Trying to connect to metastore with URI 
thrift://bigdata-1734405115-lhalh:9083
   25/01/16 15:57:42 WARN metastore: Failed to connect to the MetaStore 
Server...
   25/01/16 15:57:42 INFO metastore: Waiting 3 seconds before next connection 
attempt.
   2025-01-16 15:57:45.449 INFO KyuubiSessionManager-exec-pool: Thread-93 
org.apache.kyuubi.operation.ExecuteStatement: 
Query[f7e6ec20-2ec4-47dc-be3a-1989a3b46bfd] in RUNNING_STATE
   25/01/16 15:57:45 INFO DAGScheduler: Asked to cancel job group 
f7e6ec20-2ec4-47dc-be3a-1989a3b46bfd
   25/01/16 15:57:45 ERROR ExecuteStatement: Error operating ExecuteStatement: 
org.apache.iceberg.hive.RuntimeMetaException: Failed to connect to Hive 
Metastore
        at 
org.apache.iceberg.hive.HiveClientPool.newClient(HiveClientPool.java:84)
        at 
org.apache.iceberg.hive.HiveClientPool.newClient(HiveClientPool.java:34)
        at org.apache.iceberg.ClientPoolImpl.get(ClientPoolImpl.java:125)
        at org.apache.iceberg.ClientPoolImpl.run(ClientPoolImpl.java:56)
        at org.apache.iceberg.ClientPoolImpl.run(ClientPoolImpl.java:51)
        at 
org.apache.iceberg.hive.CachedClientPool.run(CachedClientPool.java:122)
        at 
org.apache.iceberg.hive.HiveTableOperations.doRefresh(HiveTableOperations.java:158)
        at 
org.apache.iceberg.BaseMetastoreTableOperations.refresh(BaseMetastoreTableOperations.java:97)
        at 
org.apache.iceberg.BaseMetastoreTableOperations.current(BaseMetastoreTableOperations.java:80)
        at 
org.apache.iceberg.BaseMetastoreCatalog.loadTable(BaseMetastoreCatalog.java:47)
        at 
org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.BoundedLocalCache.lambda$doComputeIfAbsent$14(BoundedLocalCache.java:2406)
        at 
java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1853)
        at 
org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.BoundedLocalCache.doComputeIfAbsent(BoundedLocalCache.java:2404)
        at 
org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.BoundedLocalCache.computeIfAbsent(BoundedLocalCache.java:2387)
        at 
org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.LocalCache.computeIfAbsent(LocalCache.java:108)
        at 
org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.LocalManualCache.get(LocalManualCache.java:62)
        at org.apache.iceberg.CachingCatalog.loadTable(CachingCatalog.java:166)
        at org.apache.iceberg.spark.SparkCatalog.load(SparkCatalog.java:642)
        at 
org.apache.iceberg.spark.SparkCatalog.loadTable(SparkCatalog.java:160)
        at 
org.apache.spark.sql.connector.catalog.CatalogV2Util$.loadTable(CatalogV2Util.scala:311)
        at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.$anonfun$lookupRelation$3(Analyzer.scala:1197)
        at scala.Option.orElse(Option.scala:447)
        at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.$anonfun$lookupRelation$1(Analyzer.scala:1196)
        at scala.Option.orElse(Option.scala:447)
        at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveRelations$$lookupRelation(Analyzer.scala:1188)
        at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$13.applyOrElse(Analyzer.scala:1059)
        at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$13.applyOrElse(Analyzer.scala:1023)
        at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUpWithPruning$3(AnalysisHelper.scala:138)
        at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:176)
        at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUpWithPruning$1(AnalysisHelper.scala:138)
        at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:323)
        at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUpWithPruning(AnalysisHelper.scala:134)
        at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUpWithPruning$(AnalysisHelper.scala:130)
        at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUpWithPruning(LogicalPlan.scala:30)
        at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUpWithPruning$2(AnalysisHelper.scala:135)
        at 
org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1228)
        at 
org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1227)
        at 
org.apache.spark.sql.catalyst.plans.logical.Aggregate.mapChildren(basicLogicalOperators.scala:977)
        at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUpWithPruning$1(AnalysisHelper.scala:135)
        at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:323)
        at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUpWithPruning(AnalysisHelper.scala:134)
        at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUpWithPruning$(AnalysisHelper.scala:130)
        at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUpWithPruning(LogicalPlan.scala:30)
        at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:1023)
        at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:982)
        at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:211)
        at 
scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
        at 
scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
        at scala.collection.immutable.List.foldLeft(List.scala:91)
        at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:208)
        at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:200)
        at scala.collection.immutable.List.foreach(List.scala:431)
        at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:200)
        at 
org.apache.spark.sql.catalyst.analysis.Analyzer.org$apache$spark$sql$catalyst$analysis$Analyzer$$executeSameContext(Analyzer.scala:231)
        at 
org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$execute$1(Analyzer.scala:227)
        at 
org.apache.spark.sql.catalyst.analysis.AnalysisContext$.withNewAnalysisContext(Analyzer.scala:173)
        at 
org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:227)
        at 
org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:188)
        at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:179)
        at 
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:88)
        at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:179)
        at 
org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:212)
        at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:330)
        at 
org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:211)
        at 
org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:76)
        at 
org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
        at 
org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:185)
        at 
org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:510)
        at 
org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:185)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
        at 
org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:184)
        at 
org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:76)
        at 
org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:74)
        at 
org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:66)
        at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:98)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
        at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96)
        at 
org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:622)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
        at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:617)
        at 
org.apache.kyuubi.engine.spark.operation.ExecuteStatement.$anonfun$executeStatement$1(ExecuteStatement.scala:86)
        at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
        at 
org.apache.kyuubi.engine.spark.operation.SparkOperation.$anonfun$withLocalProperties$1(SparkOperation.scala:147)
        at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:169)
        at 
org.apache.kyuubi.engine.spark.operation.SparkOperation.withLocalProperties(SparkOperation.scala:131)
        at 
org.apache.kyuubi.engine.spark.operation.ExecuteStatement.executeStatement(ExecuteStatement.scala:81)
        at 
org.apache.kyuubi.engine.spark.operation.ExecuteStatement$$anon$1.run(ExecuteStatement.scala:103)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
   Caused by: java.lang.RuntimeException: Unable to instantiate 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient
        at 
org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1742)
        at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:83)
        at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:133)
        at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
        at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:97)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.iceberg.common.DynMethods$UnboundMethod.invokeChecked(DynMethods.java:60)
        at 
org.apache.iceberg.common.DynMethods$UnboundMethod.invoke(DynMethods.java:72)
        at 
org.apache.iceberg.common.DynMethods$StaticMethod.invoke(DynMethods.java:185)
        at 
org.apache.iceberg.hive.HiveClientPool.newClient(HiveClientPool.java:63)
        ... 91 more
   Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at 
org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1740)
        ... 103 more
   
   **second method --------------**
   When both of my catalogs are configured, it is normal to connect to 
metastore. However, when reading tables, kyuubi spark hive scan is always used, 
and the iceberg scan cannot be used to query data
   #---------------begin--------------
   spark.kerberos.access.hadoopFileSystems  hdfs://myns,hdfs://mynsbackup
   spark.sql.catalog.hive_catalog     
org.apache.kyuubi.spark.connector.hive.HiveTableCatalog
   spark.sql.catalog.hive_catalog.hive.metastore.uris     
thrift://bigdata-1734405115-lhalh:9083,thrift://bigdata-1734405115-0xt70:9083
   
#spark.sql.catalog.hive_catalog.hive.metastore.token.signature=thrift://bigdata-1734405115-lhalh:9083,thrift://bigdata-1734405115-0xt70:9083
   
   
   #local cluster metastore
   spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog
   #spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkCatalog
   spark.sql.catalog.spark_catalog.type=hive
   
spark.sql.catalog.spark_catalog.uri=thrift://bigdata-1734358521-u7gjy:9083,thrift://bigdata-1734358521-gsy9x:9083
   #配置下面是为了解决当前iceberg的uri和HiveConf.ConfVars.METASTOREURIS不相等问题
   #spark.sql.catalog.spark_catalog 
org.apache.kyuubi.spark.connector.hive.HiveTableCatalog
   #spark.sql.catalog.spark_catalog.hive.metastore.uris 
thrift://bigdata-1734358521-u7gjy:9083,thrift://bigdata-1734358521-gsy9x:9083
   
   #an other cluster metastore
   **spark.sql.catalog.spark_catalog_ky=org.apache.iceberg.spark.SparkCatalog**
   spark.sql.catalog.spark_catalog_ky.type=hive
   spark.sql.catalog.spark_catalog_ky.uri=thrift://bigdata-1734405115-lhalh:9083
   
   
**spark.sql.catalog.spark_catalog_ky=org.apache.kyuubi.spark.connector.hive.HiveTableCatalog**
   
#spark.sql.catalog.spark_catalog_ky.hive.metastore.uris=thrift://bigdata-1734405115-lhalh:9083,thrift://bigdata-1734405115-0xt70:9083
   
#spark.sql.catalog.spark_catalog_ky.hive.metastore.kerberos.principal=hive/_h...@mr.0c20903122394b4293a44ead5cd1a27e.yun.cn
   #spark.sql.catalog.spark_catalog_ky.hive.metastore.sasl.enabled=true
   
#spark.sql.catalog.spark_catalog_ky.hive.metastore.token.signature=thrift://bigdata-1734405115-lhalh:9083,thrift://bigdata-1734405115-0xt70:9083
   #--------------end----------------
   
   The sql execution plan is as follows:
   | == Physical Plan ==
   AdaptiveSparkPlan isFinalPlan=false
   +- HashAggregate(keys=[k#32], functions=[count(1)])
      +- Exchange hashpartitioning(k#32, 800), ENSURE_REQUIREMENTS, [plan_id=65]
         +- HashAggregate(keys=[k#32], functions=[partial_count(1)])
            +- Project [k#32]
               +- BatchScan[k#32] HiveScan DataFilters: [], Format: hive, 
Location: HiveCatalogFileIndex(1 
paths)[hdfs://mynsbackup/warehouse/tablespace/managed/hive/test_iceberg..., 
PartitionFilters: [], ReadSchema: struct<k:string> RuntimeFilters: []
   
    |


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscr...@kyuubi.apache.org
For additional commands, e-mail: notifications-h...@kyuubi.apache.org

Re: [I] [Bug] kyuubi failed to read the iceberg of another cluster across domains [kyuubi]

Reply via email to