[
https://issues.apache.org/jira/browse/SPARK-8503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15027697#comment-15027697
]
AJ commented on SPARK-8503:
---------------------------
I have experienced a similar problem with spark-1.5.1-bin-hadoop2.6 that I
don't believe is fixed yet. The problem appears to be that transients are
counted in the size, even though they won't ever be serialized. In
SizeEstimator.scala:308, it checks to make sure to skip static fields, but it
never checks transient fields. I think the fix might be as simple as:
for (field <- cls.getDeclaredFields) {
if (!Modifier.isStatic(field.getModifiers)) { // CHANGE THIS LINE TO:
if(!Modifier.isStatic(field.getModifiers) &
!Modifier.isTransient(field.getModifiers))
val fieldClass = field.getType
if (fieldClass.isPrimitive) {
sizeCount(primitiveSize(fieldClass)) += 1
} else {
field.setAccessible(true) // Enable future get()'s on this field
sizeCount(pointerSize) += 1
pointerFields = field :: pointerFields
}
}
}
Unfortunately, I don't have time right now to compile a full spark build. Were
there any test cases that were added for this that would make it easy to verify?
> SizeEstimator returns negative value for recursive data structures
> ------------------------------------------------------------------
>
> Key: SPARK-8503
> URL: https://issues.apache.org/jira/browse/SPARK-8503
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 1.3.1
> Reporter: Ilya Rakitsin
>
> When estimating size of recursive data structures like graphs, with transient
> fields referencing one another, SizeEstimator may return negative value if
> the structure if big enough.
> This then affects the logic of other components, e.g.
> SizeTracker#takeSample() and may lead to incorrect behavior and exceptions
> like:
> java.lang.IllegalArgumentException: requirement failed: sizeInBytes was
> negative: -9223372036854691384
> at scala.Predef$.require(Predef.scala:233)
> at org.apache.spark.storage.BlockInfo.markReady(BlockInfo.scala:55)
> at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:810)
> at
> org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:637)
> at
> org.apache.spark.storage.BlockManager.putSingle(BlockManager.scala:991)
> at
> org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:98)
> at
> org.apache.spark.broadcast.TorrentBroadcast.<init>(TorrentBroadcast.scala:84)
> at
> org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34)
> at
> org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29)
> at
> org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62)
> at org.apache.spark.SparkContext.broadcast(SparkContext.scala:1051)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]