Re: SystemML-config.xml in distributed Hadoop environment

Deron Eriksson Wed, 18 Nov 2015 11:06:44 -0800

Thank you, Niketan. That information is very useful.

Deron



On Wed, Nov 18, 2015 at 8:25 AM, Niketan Pansare <[email protected]> wrote:

> Hi Deron,
>
> Please see the below answers:
>
> Are all the property
> name/values in this file the recommended SystemML configuration settings
> when running on a Hadoop cluster?
> Yes, but some are dependent on the size of cluster (for example: number of
> reducers). So the user might need to modify them accordingly.
>
> Are any of these properties of particular
> relevance when increasing performance for the cluster?
> Yes. Going back to "the number of reducers" example, if one has 100 node
> cluster and using default "10" reducers would cause underutilization of the
> cluster.
>
> For example, I have a 4-node cluster with 3 data nodes. Should I change
> <numreducers> to be 2x the number of data nodes, so change from 10 to 6?
> 2x nodes is a good rule of thumb for the number of reducers for "MR"
> backend. I verified this in the performance experiments.
>
> Also, with regards to <optlevel>, what is being optimized and how does this
> affect performance?
> <optlevel> is a tuning flag for SystemML's runtime optimizer. I would
> recommend to use the default optlevel. Here is the documentation:
> * Optimization Types for Compilation
> *
> * O0 STATIC - Decisions for scheduling operations on CP/MR are based on
> * predefined set of rules, which check if the dimensions are below a
> * fixed/static threshold (OLD Method of choosing between CP and MR).
> * The optimization scope is LOCAL, i.e., per statement block.
> * Advanced rewrites like constant folding, common subexpression
> elimination,
> * or *inter* procedural analysis are NOT applied.
> *
> * O1 MEMORY_BASED - Every operation is scheduled on CP or MR, solely
> * based on the amount of memory required to perform that operation.
> * It does NOT take the execution time into account.
> * The optimization scope is LOCAL, i.e., per statement block.
> * Advanced rewrites like constant folding, common subexpression
> elimination,
> * or *inter* procedural analysis are NOT applied.
> *
> * O2 MEMORY_BASED - Every operation is scheduled on CP or MR, solely
> * based on the amount of memory required to perform that operation.
> * It does NOT take the execution time into account.
> * The optimization scope is LOCAL, i.e., per statement block.
> * All advanced rewrites are applied. This is the default optimization
> * level of SystemML.
> *
> * O3 GLOBAL TIME_MEMORY_BASED - Operation scheduling on CP or MR as well
> as
> * many other rewrites of data flow properties such as block size,
> partitioning,
> * replication, vectorization, etc are done with the optimization objective
> of
> * minimizing execution time under hard memory constraints per operation and
> * execution context. The optimization scope if GLOBAL, i.e., program-wide.
> * All advanced rewrites are applied. This optimization level requires more
> * optimization time but has higher optimization potential.
> *
> * O4 DEBUG MODE - All optimizations, global and local, which interfere
> with
> * breakpoints are NOT applied. This optimization level is REQUIRED for the
> * compiler running in debug mode.
>
> Thanks,
>
> Niketan Pansare
> IBM Almaden Research Center
> Phone (office): (408) 927 1740
> E-mail: [email protected]
> http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar
>
> [image: Inactive hide details for Deron Eriksson ---11/17/2015 07:31:06
> PM---Hello, The SystemML binary release comes with a SystemML c]Deron
> Eriksson ---11/17/2015 07:31:06 PM---Hello, The SystemML binary release
> comes with a SystemML configuration file
>
> From: Deron Eriksson <[email protected]>
> To: [email protected]
> Date: 11/17/2015 07:31 PM
> Subject: SystemML-config.xml in distributed Hadoop environment
> ------------------------------
>
>
>
> Hello,
>
> The SystemML binary release comes with a SystemML configuration file
> (SystemML-config.xml) in its root directory. Are all the property
> name/values in this file the recommended SystemML configuration settings
> when running on a Hadoop cluster? Are any of these properties of particular
> relevance when increasing performance for the cluster?
>
> For example, I have a 4-node cluster with 3 data nodes. Should I change
> <numreducers> to be 2x the number of data nodes, so change from 10 to 6?
>
> Also, with regards to <optlevel>, what is being optimized and how does this
> affect performance?
>
> Thanks!
> Deron
>
>
>

Re: SystemML-config.xml in distributed Hadoop environment

Reply via email to