[GitHub] [iceberg] ShubhamSharmaCSE opened a new issue, #7221: Multiple entries with same key

via GitHub Mon, 27 Mar 2023 13:53:52 -0700


ShubhamSharmaCSE opened a new issue, #7221:
URL: https://github.com/apache/iceberg/issues/7221


   Getting below error when running spark application for partitioned iceberg 
table. Partition is based on created date over days.
   
   ```
   Exception in thread "main" java.lang.IllegalArgumentException: Multiple 
entries with same key: 1000=original_values.column1.column2.column3 and 
1000=_partition.created_ts_day
        at 
org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap.conflictException(ImmutableMap.java:376)
        at 
org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap.checkNoConflict(ImmutableMap.java:370)
        at 
org.apache.iceberg.relocated.com.google.common.collect.RegularImmutableMap.checkNoConflictInKeyBucket(RegularImmutableMap.java:153)
        at 
org.apache.iceberg.relocated.com.google.common.collect.RegularImmutableMap.fromEntryArray(RegularImmutableMap.java:115)
        at 
org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap$Builder.buildOrThrow(ImmutableMap.java:574)
        at 
org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap$Builder.build(ImmutableMap.java:538)
        at org.apache.iceberg.types.IndexByName.byId(IndexByName.java:80)
        at org.apache.iceberg.types.TypeUtil.indexNameById(TypeUtil.java:161)
        at org.apache.iceberg.Schema.lazyIdToName(Schema.java:167)
        at org.apache.iceberg.Schema.<init>(Schema.java:108)
        at org.apache.iceberg.Schema.<init>(Schema.java:91)
        at org.apache.iceberg.Schema.<init>(Schema.java:83)
        at org.apache.iceberg.Schema.<init>(Schema.java:79)
        at org.apache.iceberg.types.TypeUtil.join(TypeUtil.java:149)
        at 
org.apache.iceberg.spark.source.SparkScanBuilder.schemaWithMetadataColumns(SparkScanBuilder.java:159)
        at 
org.apache.iceberg.spark.source.SparkScanBuilder.buildMergeOnReadScan(SparkScanBuilder.java:225)
        at 
org.apache.iceberg.spark.source.SparkPositionDeltaOperation$1.build(SparkPositionDeltaOperation.java:70)
        at 
org.apache.spark.sql.execution.datasources.v2.PushDownUtils$.pruneColumns(PushDownUtils.scala:176)
        at 
org.apache.spark.sql.execution.datasources.v2.RowLevelCommandScanRelationPushDown$$anonfun$apply$1.applyOrElse(RowLevelCommandScanRelationPushDown.scala:52)
        at 
org.apache.spark.sql.execution.datasources.v2.RowLevelCommandScanRelationPushDown$$anonfun$apply$1.applyOrElse(RowLevelCommandScanRelationPushDown.scala:40)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:615)
        at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:177)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:615)
        at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
        at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
        at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
        at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
        at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:591)
        at 
org.apache.spark.sql.execution.datasources.v2.RowLevelCommandScanRelationPushDown$.apply(RowLevelCommandScanRelationPushDown.scala:40)
        at 
org.apache.spark.sql.execution.datasources.v2.RowLevelCommandScanRelationPushDown$.apply(RowLevelCommandScanRelationPushDown.scala:37)
        at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:215)
        at 
scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
        at 
scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
        at scala.collection.immutable.List.foldLeft(List.scala:91)
        at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.executeBatch$1(RuleExecutor.scala:212)
        at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$6(RuleExecutor.scala:284)
        at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
        at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$RuleExecutionContext$.withContext(RuleExecutor.scala:327)
        at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$5(RuleExecutor.scala:284)
        at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$5$adapted(RuleExecutor.scala:274)
        at scala.collection.immutable.List.foreach(List.scala:431)
        at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:274)
        at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:188)
        at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:179)
        at 
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
        at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:179)
        at 
org.apache.spark.sql.execution.QueryExecution.$anonfun$optimizedPlan$1(QueryExecution.scala:134)
        at 
org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:192)
        at 
org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:213)
        at 
org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:552)
        at 
org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:213)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
        at 
org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:212)
        at 
org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:130)
        at 
org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:126)
        at 
org.apache.spark.sql.execution.QueryExecution.$anonfun$writePlans$4(QueryExecution.scala:296)
        at 
org.apache.spark.sql.catalyst.plans.QueryPlan$.append(QueryPlan.scala:657)
        at 
org.apache.spark.sql.execution.QueryExecution.writePlans(QueryExecution.scala:296)
        at 
org.apache.spark.sql.execution.QueryExecution.toString(QueryExecution.scala:313)
        at 
org.apache.spark.sql.execution.QueryExecution.org$apache$spark$sql$execution$QueryExecution$$explainString(QueryExecution.scala:267)
        at 
org.apache.spark.sql.execution.QueryExecution.explainString(QueryExecution.scala:246)
        at 
org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:107)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$7(SQLExecution.scala:139)
        at 
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
        at 
org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:224)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:139)
        at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:245)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:138)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
        at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68)
        at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:100)
        at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:96)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:615)
        at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:177)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:615)
        at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
        at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
        at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
        at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
        at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:591)
        at 
org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:96)
        at 
org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:83)
        at 
org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:81)
        at org.apache.spark.sql.Dataset.<init>(Dataset.scala:222)
        at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:102)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
        at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:99)
        at 
org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:622)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
        at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:617)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] ShubhamSharmaCSE opened a new issue, #7221: Multiple entries with same key

Reply via email to