[ 
https://issues.apache.org/jira/browse/SPARK-14055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen resolved SPARK-14055.
--------------------------------
       Resolution: Fixed
    Fix Version/s: 2.0.0

Issue resolved by pull request 11875
[https://github.com/apache/spark/pull/11875]

> AssertionError may happeneds if not unlock writeLock when doing 'removeBlock' 
> method
> ------------------------------------------------------------------------------------
>
>                 Key: SPARK-14055
>                 URL: https://issues.apache.org/jira/browse/SPARK-14055
>             Project: Spark
>          Issue Type: Bug
>          Components: Block Manager, Spark Core
>    Affects Versions: 2.0.0
>         Environment: Spark 2.0-SNAPSHOT
> Single Rack
> Standalone mode scheduling
> 8 node cluster
> 16 cores & 64G RAM / node
> Data Replication factor of 2
> Each Node has 1 Spark executors configured with 16 cores each and 40GB of RAM.
>            Reporter: Ernest
>            Assignee: Ernest
>            Priority: Critical
>             Fix For: 2.0.0
>
>
> We got the following log when running _LiveJournalPageRank_.
> {quote}
> 452823:16/03/21 19:28:47.444 TRACE BlockInfoManager: Task 1662 trying to 
> acquire write lock for rdd_3_183
> 452825:16/03/21 19:28:47.445 TRACE BlockInfoManager: Task 1662 acquired write 
> lock for rdd_3_183
> 456941:16/03/21 19:28:47.596 INFO BlockManager: Dropping block rdd_3_183 from 
> memory
> 456943:16/03/21 19:28:47.597 DEBUG MemoryStore: Block rdd_3_183 of size 
> 418784648 dropped from memory (free 3504141600)
> 457027:16/03/21 19:28:47.600 DEBUG BlockManagerMaster: Updated info of block 
> rdd_3_183
> 457053:16/03/21 19:28:47.600 DEBUG BlockManager: Told master about block 
> rdd_3_183
> 457082:16/03/21 19:28:47.602 TRACE BlockInfoManager: Task 1662 trying to 
> remove block rdd_3_183
> 500373:16/03/21 19:28:49.893 TRACE BlockInfoManager: Task 1681 trying to put 
> rdd_3_183
> 500374:16/03/21 19:28:49.893 TRACE BlockInfoManager: Task 1681 trying to 
> acquire read lock for rdd_3_183
> 500375:16/03/21 19:28:49.893 TRACE BlockInfoManager: Task 1681 trying to 
> acquire write lock for rdd_3_183
> 500376:16/03/21 19:28:49.893 TRACE BlockInfoManager: Task 1681 acquired write 
> lock for rdd_3_183
> 517257:16/03/21 19:28:56.299 INFO BlockInfoManager: ****** taskAttemptId is: 
> 1662, info.writerTask is: 1681, blockID is: rdd_3_183 so AssertionError 
> happeneds here*****
> 517258-16/03/21 19:28:56.299 ERROR Executor: Exception in task 177.0 in stage 
> 10.0 (TID 1662)
> 517259-java.lang.AssertionError: assertion failed
> 517260- at scala.Predef$.assert(Predef.scala:151)
> 517261- at 
> org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$1$$anonfun$apply$1.apply(BlockInfoManager.scala:356)
> 517262- at 
> org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$1$$anonfun$apply$1.apply(BlockInfoManager.scala:351)
> 517263- at scala.Option.foreach(Option.scala:257)
> 517264- at 
> org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$1.apply(BlockInfoManager.scala:351)
> 517265- at 
> org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$1.apply(BlockInfoManager.scala:350)
> 517266- at scala.collection.mutable.HashSet.foreach(HashSet.scala:78)
> 517267- at 
> org.apache.spark.storage.BlockInfoManager.releaseAllLocksForTask(BlockInfoManager.scala:350)
> 517268- at 
> org.apache.spark.storage.BlockManager.releaseAllLocksForTask(BlockManager.scala:626)
> 517269- at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:238)
> {quote}
> When memory for RDD storage is not sufficient and have to evict several 
> partitions, this _AssertionError_ may happened. 
> For the above example, this is because while running _Task 1662_, several 
> partition (including rdd_3_183) need to be evicted. So _Task 1662_ acquired  
> read and write locks at first, then doing _dropBlock_ method in 
> _MemoryStore.evictBlocksToFreeSpace_ and actually dropping _rdd_3_183_ from 
> memory. The _newEffectiveStorageLevel.isValid_ is false, so we run into 
> _BlockInfoManager.removeBlock_, but _writeLocksByTask_  is not update here.
> Unfortunately, _Task 1681_ is already started and needed to reproduce 
> rdd\_3\_183 to produce it's target rdd here , and this task acquired write 
> lock of rdd\_3\_183. When _Task 1662_ call _releaseAllLocksForTask_ at last, 
> this _AssertionError_ occurs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to