Interesting - thanks for the advice Matthias. On Sun, Jul 16, 2017 at 3:48 PM, Matthias Boehm <mboe...@googlemail.com> wrote:
> Hi Anthony, > > other than for testing, we usually don't recommend to run in spark local > mode as it breaks the memory model of SystemML, and thus, can lead to OOMs. > Regarding the memory configuration of a single-node setup, you're usually > best served by allocating most of your memory to the driver as we support > multi-threaded single-node operations and this avoids the (unnecessary) > overhead of distributed operations (w/ partial aggregation, constraints, > and numerous other overheads). > > The only exception are nodes with very large memory and if you have dense > matrices that exceed 16GB. Our current dense matrix blocks (for single-node > operations the entire matrix is represented as a single block) use a > linearized array which is limited to 2B elements in Java. In this > situation, it can be beneficial to spent most memory for the executor to > exploit all available memory and perform multi-threaded operations via > pseudo-distributed operations. Note that there is the open SYSTEMML-1312 > task to support large dense matrix blocks with >16GB but it's unclear if it > will make it into the SystemML 1.0 release. > > Regards, > Matthias > > On Sun, Jul 16, 2017 at 3:25 PM, Anthony Thomas <ahtho...@eng.ucsd.edu> > wrote: > > > Hi SystemML folks, > > > > Are there any recommended Spark configurations when running SystemML on a > > single machine? I.e. is there a difference between launching Spark with > > master=local[*] and running SystemML as a standard process in the JVM as > > opposed to launching a single node spark cluster? If the latter, is > there a > > recommended balance between driver and executor memory? > > > > Thanks, > > > > Anthony > > >