[ 
https://issues.apache.org/jira/browse/SPARK-19556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15867152#comment-15867152
 ] 

Genmao Yu commented on SPARK-19556:
-----------------------------------

[~vanzin] I am working on this, could you please assign it to me?


> Broadcast data is not encrypted when I/O encryption is on
> ---------------------------------------------------------
>
>                 Key: SPARK-19556
>                 URL: https://issues.apache.org/jira/browse/SPARK-19556
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.1.0
>            Reporter: Marcelo Vanzin
>
> {{TorrentBroadcast}} uses a couple of "back doors" into the block manager to 
> write and read data:
> {code}
>       if (!blockManager.putBytes(pieceId, bytes, MEMORY_AND_DISK_SER, 
> tellMaster = true)) {
>         throw new SparkException(s"Failed to store $pieceId of $broadcastId 
> in local BlockManager")
>       }
> {code}
> {code}
>       bm.getLocalBytes(pieceId) match {
>         case Some(block) =>
>           blocks(pid) = block
>           releaseLock(pieceId)
>         case None =>
>           bm.getRemoteBytes(pieceId) match {
>             case Some(b) =>
>               if (checksumEnabled) {
>                 val sum = calcChecksum(b.chunks(0))
>                 if (sum != checksums(pid)) {
>                   throw new SparkException(s"corrupt remote block $pieceId of 
> $broadcastId:" +
>                     s" $sum != ${checksums(pid)}")
>                 }
>               }
>               // We found the block from remote executors/driver's 
> BlockManager, so put the block
>               // in this executor's BlockManager.
>               if (!bm.putBytes(pieceId, b, StorageLevel.MEMORY_AND_DISK_SER, 
> tellMaster = true)) {
>                 throw new SparkException(
>                   s"Failed to store $pieceId of $broadcastId in local 
> BlockManager")
>               }
>               blocks(pid) = b
>             case None =>
>               throw new SparkException(s"Failed to get $pieceId of 
> $broadcastId")
>           }
>       }
> {code}
> The thing these block manager methods have in common is that they bypass the 
> encryption code; so broadcast data is stored unencrypted in the block 
> manager, causing unencrypted data to be written to disk if those blocks need 
> to be evicted from memory.
> The correct fix here is actually not to change {{TorrentBroadcast}}, but to 
> fix the block manager so that:
> - data stored in memory is not encrypted
> - data written to disk is encrypted
> This would simplify the code paths that use BlockManager / SerializerManager 
> APIs (e.g. see SPARK-19520), but requires some tricky changes inside the 
> BlockManager to still be able to use file channels to avoid reading whole 
> blocks back into memory so they can be decrypted.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to