the buffer thing default for Mahout should be bigger than Spark's default.
It may seem like a poor decision, but the fact is that optimizer merges
("blockifies") matrix row partitions in some partitions in a lazy way in
order to simplify/ (and even perhaps speed up) block-wise matrix
algorithms.As a result, there are some times situations when Spark may decide to put the entire block on the wire as a single blob. That implies that the entire matrix partition may need to be able to fit into kryo buffer at times. On Wed, Nov 19, 2014 at 1:04 PM, Pat Ferrel <[email protected]> wrote: > For historical reasons some code I stole to do the similarity stuff had > the following spark settings: > > sparkConf.set("spark.kryo.referenceTracking", "false") > .set("spark.kryoserializer.buffer.mb", "200")// todo: should this be > left to config or an option? > > I’m not all that familiar with kryo. Are these things better left to a > -D:key=value type param? Seems like they shouldn’t be hard coded unless > tracking is universal. Any opinions? > > >
