[GitHub] [iceberg] zhanghaicheng1 opened a new issue #3016: Spark Sql Cannot use DELETE FROM to delete data in Iceberg

GitBox Mon, 23 Aug 2021 19:17:29 -0700


zhanghaicheng1 opened a new issue #3016:
URL: https://github.com/apache/iceberg/issues/3016



   spark version : 3.1.2
   org.apache.iceberg.iceberg-hive : 0.11.1
   scala.version: 2.12.8
   hadoop.version: 3.0.0-cdh6.1.1
   
   ```
   def main(args: Array[String]): Unit = {
       val spark = SparkSession.builder()
         .config("spark.sql.catalog.hadoop_prod.type", "hadoop") // 
设置数据源类别为hadoop
         .config("spark.sql.catalog.hadoop_prod", classOf[SparkCatalog].getName)
         // 指定Hadoop数据源的根目录
         .config("spark.sql.catalog.hadoop_prod.warehouse", 
"hdfs://centos4:8020/doit/iceberg/warehouse/") // 设置数据源位置
         .config("spark.sql.sources.partitionOverwriteMode", "dynamic")
         .appName(this.getClass.getSimpleName)
         .master("local[*]")
         .getOrCreate()
       val deleteSingleDataSQL = "DELETE FROM  hadoop_prod.logging.tb_user1 
where id=3 "
       spark.table("hadoop_prod.logging.tb_user1").show
       spark.sql(deleteSingleDataSQL)
       spark.table("hadoop_prod.logging.tb_user1").show
       
   ```
   when the code runs, the exception message is:
   
   ```
   Using Spark's default log4j profile: 
org/apache/spark/log4j-defaults.properties
   21/08/24 10:14:42 WARN Utils: Your hostname, ocean resolves to a loopback 
address: 127.0.0.1; using 192.168.2.162 instead (on interface en0)
   21/08/24 10:14:42 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to 
another address
   21/08/24 10:14:43 INFO SparkContext: Running Spark version 3.1.2
   21/08/24 10:14:43 INFO ResourceUtils: 
==============================================================
   21/08/24 10:14:43 INFO ResourceUtils: No custom resources configured for 
spark.driver.
   21/08/24 10:14:43 INFO ResourceUtils: 
==============================================================
   21/08/24 10:14:43 INFO SparkContext: Submitted application: 
DeleteByContition$
   21/08/24 10:14:43 INFO ResourceProfile: Default ResourceProfile created, 
executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , 
memory -> name: memory, amount: 1024, script: , vendor: , offHeap -> name: 
offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: 
cpus, amount: 1.0)
   21/08/24 10:14:43 INFO ResourceProfile: Limiting resource is cpu
   21/08/24 10:14:43 INFO ResourceProfileManager: Added ResourceProfile id: 0
   21/08/24 10:14:43 INFO SecurityManager: Changing view acls to: zhc
   21/08/24 10:14:43 INFO SecurityManager: Changing modify acls to: zhc
   21/08/24 10:14:43 INFO SecurityManager: Changing view acls groups to: 
   21/08/24 10:14:43 INFO SecurityManager: Changing modify acls groups to: 
   21/08/24 10:14:43 INFO SecurityManager: SecurityManager: authentication 
disabled; ui acls disabled; users  with view permissions: Set(zhc); groups with 
view permissions: Set(); users  with modify permissions: Set(zhc); groups with 
modify permissions: Set()
   21/08/24 10:14:43 INFO Utils: Successfully started service 'sparkDriver' on 
port 64863.
   21/08/24 10:14:43 INFO SparkEnv: Registering MapOutputTracker
   21/08/24 10:14:43 INFO SparkEnv: Registering BlockManagerMaster
   21/08/24 10:14:43 INFO BlockManagerMasterEndpoint: Using 
org.apache.spark.storage.DefaultTopologyMapper for getting topology information
   21/08/24 10:14:43 INFO BlockManagerMasterEndpoint: 
BlockManagerMasterEndpoint up
   21/08/24 10:14:43 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
   21/08/24 10:14:43 INFO DiskBlockManager: Created local directory at 
/private/var/folders/2g/f4g3ss0n0jv5r434ngcjbbhr0000gn/T/blockmgr-7e501f6e-2327-4c37-8777-900e8c25f7ea
   21/08/24 10:14:43 INFO MemoryStore: MemoryStore started with capacity 4.1 GiB
   21/08/24 10:14:43 INFO SparkEnv: Registering OutputCommitCoordinator
   21/08/24 10:14:44 INFO Utils: Successfully started service 'SparkUI' on port 
4040.
   21/08/24 10:14:44 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at 
http://ocean.lan:4040
   21/08/24 10:14:44 INFO Executor: Starting executor ID driver on host 
ocean.lan
   21/08/24 10:14:44 INFO Utils: Successfully started service 
'org.apache.spark.network.netty.NettyBlockTransferService' on port 64867.
   21/08/24 10:14:44 INFO NettyBlockTransferService: Server created on 
ocean.lan:64867
   21/08/24 10:14:44 INFO BlockManager: Using 
org.apache.spark.storage.RandomBlockReplicationPolicy for block replication 
policy
   21/08/24 10:14:44 INFO BlockManagerMaster: Registering BlockManager 
BlockManagerId(driver, ocean.lan, 64867, None)
   21/08/24 10:14:44 INFO BlockManagerMasterEndpoint: Registering block manager 
ocean.lan:64867 with 4.1 GiB RAM, BlockManagerId(driver, ocean.lan, 64867, None)
   21/08/24 10:14:44 INFO BlockManagerMaster: Registered BlockManager 
BlockManagerId(driver, ocean.lan, 64867, None)
   21/08/24 10:14:44 INFO BlockManager: Initialized BlockManager: 
BlockManagerId(driver, ocean.lan, 64867, None)
   21/08/24 10:14:44 INFO SharedState: spark.sql.warehouse.dir is not set, but 
hive.metastore.warehouse.dir is set. Setting spark.sql.warehouse.dir to the 
value of hive.metastore.warehouse.dir ('/user/hive/warehouse').
   21/08/24 10:14:44 INFO SharedState: Warehouse path is '/user/hive/warehouse'.
   21/08/24 10:14:46 INFO BaseMetastoreCatalog: Table loaded by catalog: 
hadoop_prod.logging.tb_user1
   21/08/24 10:14:47 INFO MemoryStore: Block broadcast_0 stored as values in 
memory (estimated size 438.4 KiB, free 4.1 GiB)
   21/08/24 10:14:47 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes 
in memory (estimated size 42.7 KiB, free 4.1 GiB)
   21/08/24 10:14:47 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory 
on ocean.lan:64867 (size: 42.7 KiB, free: 4.1 GiB)
   21/08/24 10:14:47 INFO SparkContext: Created broadcast 0 from broadcast at 
SparkScanBuilder.java:171
   21/08/24 10:14:47 INFO MemoryStore: Block broadcast_1 stored as values in 
memory (estimated size 40.0 B, free 4.1 GiB)
   21/08/24 10:14:47 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes 
in memory (estimated size 116.0 B, free 4.1 GiB)
   21/08/24 10:14:47 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory 
on ocean.lan:64867 (size: 116.0 B, free: 4.1 GiB)
   21/08/24 10:14:47 INFO SparkContext: Created broadcast 1 from broadcast at 
SparkScanBuilder.java:172
   21/08/24 10:14:47 INFO V2ScanRelationPushDown: 
   Pushing operators to hadoop_prod.logging.tb_user1
   Pushed Filters: 
   Post-Scan Filters: 
   Output: id#0, name#1, age#2
            
   21/08/24 10:14:47 INFO BaseTableScan: Scanning table 
hadoop_prod.logging.tb_user1 snapshot 4602380977344673234 created at 2021-08-19 
11:14:20.938 with filter true
   21/08/24 10:14:49 INFO CodeGenerator: Code generated in 323.058098 ms
   21/08/24 10:14:49 INFO SparkContext: Starting job: show at 
DeleteByContition.scala:24
   21/08/24 10:14:49 INFO DAGScheduler: Got job 0 (show at 
DeleteByContition.scala:24) with 1 output partitions
   21/08/24 10:14:49 INFO DAGScheduler: Final stage: ResultStage 0 (show at 
DeleteByContition.scala:24)
   21/08/24 10:14:49 INFO DAGScheduler: Parents of final stage: List()
   21/08/24 10:14:49 INFO DAGScheduler: Missing parents: List()
   21/08/24 10:14:49 INFO DAGScheduler: Submitting ResultStage 0 
(MapPartitionsRDD[3] at show at DeleteByContition.scala:24), which has no 
missing parents
   21/08/24 10:14:49 INFO MemoryStore: Block broadcast_2 stored as values in 
memory (estimated size 8.0 KiB, free 4.1 GiB)
   21/08/24 10:14:49 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes 
in memory (estimated size 3.8 KiB, free 4.1 GiB)
   21/08/24 10:14:49 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory 
on ocean.lan:64867 (size: 3.8 KiB, free: 4.1 GiB)
   21/08/24 10:14:49 INFO SparkContext: Created broadcast 2 from broadcast at 
DAGScheduler.scala:1388
   21/08/24 10:14:49 INFO DAGScheduler: Submitting 1 missing tasks from 
ResultStage 0 (MapPartitionsRDD[3] at show at DeleteByContition.scala:24) 
(first 15 tasks are for partitions Vector(0))
   21/08/24 10:14:49 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks 
resource profile 0
   21/08/24 10:14:49 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 
0) (ocean.lan, executor driver, partition 0, ANY, 8662 bytes) 
taskResourceAssignments Map()
   21/08/24 10:14:49 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
   21/08/24 10:14:49 INFO ZlibFactory: Successfully loaded & initialized 
native-zlib library
   21/08/24 10:14:49 INFO CodecPool: Got brand-new decompressor [.gz]
   21/08/24 10:14:50 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 
1609 bytes result sent to driver
   21/08/24 10:14:50 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 
0) in 884 ms on ocean.lan (executor driver) (1/1)
   21/08/24 10:14:50 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks 
have all completed, from pool 
   21/08/24 10:14:50 INFO DAGScheduler: ResultStage 0 (show at 
DeleteByContition.scala:24) finished in 0.958 s
   21/08/24 10:14:50 INFO DAGScheduler: Job 0 is finished. Cancelling potential 
speculative or zombie tasks for this job
   21/08/24 10:14:50 INFO TaskSchedulerImpl: Killing all running tasks in stage 
0: Stage finished
   21/08/24 10:14:50 INFO DAGScheduler: Job 0 finished: show at 
DeleteByContition.scala:24, took 0.990896 s
   21/08/24 10:14:50 INFO CodeGenerator: Code generated in 21.044329 ms
   +---+-------------+---+
   | id|         name|age|
   +---+-------------+---+
   |  1|zhanghaicheng| 20|
   |  2|        xutao| 18|
   |  3| sunpengcheng| 19|
   +---+-------------+---+
   
   21/08/24 10:14:50 INFO MemoryStore: Block broadcast_3 stored as values in 
memory (estimated size 438.4 KiB, free 4.1 GiB)
   21/08/24 10:14:50 INFO MemoryStore: Block broadcast_3_piece0 stored as bytes 
in memory (estimated size 42.7 KiB, free 4.1 GiB)
   21/08/24 10:14:50 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory 
on ocean.lan:64867 (size: 42.7 KiB, free: 4.1 GiB)
   21/08/24 10:14:50 INFO SparkContext: Created broadcast 3 from broadcast at 
SparkScanBuilder.java:171
   21/08/24 10:14:50 INFO MemoryStore: Block broadcast_4 stored as values in 
memory (estimated size 40.0 B, free 4.1 GiB)
   21/08/24 10:14:50 INFO MemoryStore: Block broadcast_4_piece0 stored as bytes 
in memory (estimated size 116.0 B, free 4.1 GiB)
   21/08/24 10:14:50 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory 
on ocean.lan:64867 (size: 116.0 B, free: 4.1 GiB)
   21/08/24 10:14:50 INFO SparkContext: Created broadcast 4 from broadcast at 
SparkScanBuilder.java:172
   21/08/24 10:14:50 INFO V2ScanRelationPushDown: 
   Pushing operators to hadoop_prod.logging.tb_user1
   Pushed Filters: 
   Post-Scan Filters: 
   Output: id#22, name#23, age#24
            
   Exception in thread "main" org.apache.spark.sql.AnalysisException: Cannot 
delete from table hadoop_prod.logging.tb_user1 where [EqualTo(id,3)]
        at 
org.apache.spark.sql.execution.datasources.v2.DataSourceV2Strategy.apply(DataSourceV2Strategy.scala:251)
        at 
org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$1(QueryPlanner.scala:63)
        at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:489)
        at 
org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)
        at 
org.apache.spark.sql.execution.SparkStrategies.plan(SparkStrategies.scala:67)
        at 
org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$3(QueryPlanner.scala:78)
        at 
scala.collection.TraversableOnce.$anonfun$foldLeft$1(TraversableOnce.scala:160)
        at 
scala.collection.TraversableOnce.$anonfun$foldLeft$1$adapted(TraversableOnce.scala:160)
        at scala.collection.Iterator.foreach(Iterator.scala:941)
        at scala.collection.Iterator.foreach$(Iterator.scala:941)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
        at scala.collection.TraversableOnce.foldLeft(TraversableOnce.scala:160)
        at scala.collection.TraversableOnce.foldLeft$(TraversableOnce.scala:158)
        at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1429)
        at 
org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$2(QueryPlanner.scala:75)
        at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
        at 
org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)
        at 
org.apache.spark.sql.execution.SparkStrategies.plan(SparkStrategies.scala:67)
        at 
org.apache.spark.sql.execution.QueryExecution$.createSparkPlan(QueryExecution.scala:391)
        at 
org.apache.spark.sql.execution.QueryExecution.$anonfun$sparkPlan$1(QueryExecution.scala:104)
        at 
org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
        at 
org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:143)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
        at 
org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:143)
        at 
org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:104)
        at 
org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:97)
        at 
org.apache.spark.sql.execution.QueryExecution.$anonfun$executedPlan$1(QueryExecution.scala:117)
        at 
org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
        at 
org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:143)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
        at 
org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:143)
        at 
org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:117)
        at 
org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:110)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:101)
        at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
        at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
        at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3685)
        at org.apache.spark.sql.Dataset.<init>(Dataset.scala:228)
        at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
        at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96)
        at 
org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:618)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
        at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:613)
        at com.aisino.delete.DeleteByContition$.main(DeleteByContition.scala:25)
        at com.aisino.delete.DeleteByContition.main(DeleteByContition.scala)
   21/08/24 10:14:50 INFO SparkContext: Invoking stop() from shutdown hook
   21/08/24 10:14:50 INFO SparkUI: Stopped Spark web UI at http://ocean.lan:4040
   21/08/24 10:14:50 INFO MapOutputTrackerMasterEndpoint: 
MapOutputTrackerMasterEndpoint stopped!
   21/08/24 10:14:50 INFO MemoryStore: MemoryStore cleared
   21/08/24 10:14:50 INFO BlockManager: BlockManager stopped
   21/08/24 10:14:50 INFO BlockManagerMaster: BlockManagerMaster stopped
   21/08/24 10:14:50 INFO 
OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: 
OutputCommitCoordinator stopped!
   21/08/24 10:14:50 INFO SparkContext: Successfully stopped SparkContext
   21/08/24 10:14:50 INFO ShutdownHookManager: Shutdown hook called
   21/08/24 10:14:50 INFO ShutdownHookManager: Deleting directory 
/private/var/folders/2g/f4g3ss0n0jv5r434ngcjbbhr0000gn/T/spark-481ebd46-87e2-48ea-bb1a-1e2a7c9cc762
   
   Process finished with exit code 1
   
   ```
   Looking forward to reply! Thank you！


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] zhanghaicheng1 opened a new issue #3016: Spark Sql Cannot use DELETE FROM to delete data in Iceberg

Reply via email to