[
https://issues.apache.org/jira/browse/SPARK-19556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15925067#comment-15925067
]
Apache Spark commented on SPARK-19556:
--------------------------------------
User 'vanzin' has created a pull request for this issue:
https://github.com/apache/spark/pull/17295
> Broadcast data is not encrypted when I/O encryption is on
> ---------------------------------------------------------
>
> Key: SPARK-19556
> URL: https://issues.apache.org/jira/browse/SPARK-19556
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 2.1.0
> Reporter: Marcelo Vanzin
>
> {{TorrentBroadcast}} uses a couple of "back doors" into the block manager to
> write and read data:
> {code}
> if (!blockManager.putBytes(pieceId, bytes, MEMORY_AND_DISK_SER,
> tellMaster = true)) {
> throw new SparkException(s"Failed to store $pieceId of $broadcastId
> in local BlockManager")
> }
> {code}
> {code}
> bm.getLocalBytes(pieceId) match {
> case Some(block) =>
> blocks(pid) = block
> releaseLock(pieceId)
> case None =>
> bm.getRemoteBytes(pieceId) match {
> case Some(b) =>
> if (checksumEnabled) {
> val sum = calcChecksum(b.chunks(0))
> if (sum != checksums(pid)) {
> throw new SparkException(s"corrupt remote block $pieceId of
> $broadcastId:" +
> s" $sum != ${checksums(pid)}")
> }
> }
> // We found the block from remote executors/driver's
> BlockManager, so put the block
> // in this executor's BlockManager.
> if (!bm.putBytes(pieceId, b, StorageLevel.MEMORY_AND_DISK_SER,
> tellMaster = true)) {
> throw new SparkException(
> s"Failed to store $pieceId of $broadcastId in local
> BlockManager")
> }
> blocks(pid) = b
> case None =>
> throw new SparkException(s"Failed to get $pieceId of
> $broadcastId")
> }
> }
> {code}
> The thing these block manager methods have in common is that they bypass the
> encryption code; so broadcast data is stored unencrypted in the block
> manager, causing unencrypted data to be written to disk if those blocks need
> to be evicted from memory.
> The correct fix here is actually not to change {{TorrentBroadcast}}, but to
> fix the block manager so that:
> - data stored in memory is not encrypted
> - data written to disk is encrypted
> This would simplify the code paths that use BlockManager / SerializerManager
> APIs (e.g. see SPARK-19520), but requires some tricky changes inside the
> BlockManager to still be able to use file channels to avoid reading whole
> blocks back into memory so they can be decrypted.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]