Matthias Boehm created SYSTEMML-2487:

             Summary: Native Dnn operations crashing in over-provisioned parfor
                 Key: SYSTEMML-2487
             Project: SystemML
          Issue Type: Bug
            Reporter: Matthias Boehm

In case parfor does not consume all the available parallelism, we propagate 
this parallelism down to individual operations with slight (max 50%) 
overprovisioning. For example, if we have 80vcores, and parfor is assigned 
k=47, we still assign k=2 to individual operations. 

However, with native DNN operations this causes JVM crashes as follows:
# A fatal error has been detected by the Java Runtime Environment:
#  SIGFPE (0x8) at pc=0x00007f5de21902d6, pid=335027, tid=0x00007f5df8bcb700
# JRE version: OpenJDK Runtime Environment (8.0_161-b14) (build 1.8.0_161-b14)
# Java VM: OpenJDK 64-Bit Server VM (25.161-b14 mixed mode linux-amd64 )
# Problematic frame:
# C  [][thread 140041622857472 also had an error]

Hence, when native BLAS or DNN libraries are loaded, we should be more 
conservative and not over-provision at all. 

This message was sent by Atlassian JIRA

Reply via email to