[ 
https://issues.apache.org/jira/browse/SYSTEMML-1078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry updated SYSTEMML-1078:
--------------------------------------
    Description: 
Randomly during training of a model, the following error will occur.  It 
appears that during the course of training, the characteristics of the 
intermediate matrices can change, and if one of them becomes sparse enough to 
fall into the "Ultra Sparse" category, an internal compiler error is 
encountered in which the *true* and *expected* number of non-zeros diverge.

{code}
Caused by: org.apache.sysml.api.mlcontext.MLContextException: Exception 
occurred while executing runtime program
        at 
org.apache.sysml.api.mlcontext.ScriptExecutor.executeRuntimeProgram(ScriptExecutor.java:377)
        at 
org.apache.sysml.api.mlcontext.ScriptExecutor.execute(ScriptExecutor.java:320)
        at org.apache.sysml.api.mlcontext.MLContext.execute(MLContext.java:287)
        ... 11 more
Caused by: org.apache.sysml.runtime.DMLRuntimeException: 
org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error in while 
program block generated from while statement block between lines 17 and 45 -- 
Error evaluating while program block
        at 
org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:152)
        at 
org.apache.sysml.api.mlcontext.ScriptExecutor.executeRuntimeProgram(ScriptExecutor.java:375)
        ... 13 more
Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error 
in while program block generated from while statement block between lines 17 
and 45 -- Error evaluating while program block
        at 
org.apache.sysml.runtime.controlprogram.WhileProgramBlock.execute(WhileProgramBlock.java:181)
        at 
org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:145)
        ... 14 more
Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error 
in program block generated from statement block between lines 32 and 32 -- 
Error evaluating instruction: 
CP°extfunct°./mnist_lenet.dml°train°14°10°X·MATRIX·DOUBLE°Y·MATRIX·DOUBLE°X_val·MATRIX·DOUBLE°Y_val·MATRIX·DOUBLE°C·SCALAR·DOUBLE·false°Hin·SCALAR·DOUBLE·false°Win·SCALAR·DOUBLE·false°lr·SCALAR·DOUBLE·false°mu·SCALAR·DOUBLE·false°decay·SCALAR·DOUBLE·false°lambda·SCALAR·DOUBLE·false°50·SCALAR·INT·true°1·SCALAR·INT·true°iters·SCALAR·DOUBLE·false°Wc1°bc1°Wc2°bc2°Wc3°bc3°Wa1°ba1°Wa2°ba2
        at 
org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:335)
        at 
org.apache.sysml.runtime.controlprogram.ProgramBlock.executeInstructions(ProgramBlock.java:224)
        at 
org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramBlock.java:168)
        at 
org.apache.sysml.runtime.controlprogram.WhileProgramBlock.execute(WhileProgramBlock.java:169)
        ... 15 more
Caused by: org.apache.sysml.runtime.DMLRuntimeException: error executing 
function ./mnist_lenet.dml::train
        at 
org.apache.sysml.runtime.instructions.cp.FunctionCallCPInstruction.processInstruction(FunctionCallCPInstruction.java:184)
        at 
org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:305)
        ... 18 more
Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error 
in function program block generated from function statement block between lines 
38 and 270 -- Error evaluating function program block
        at 
org.apache.sysml.runtime.controlprogram.FunctionProgramBlock.execute(FunctionProgramBlock.java:121)
        at 
org.apache.sysml.runtime.instructions.cp.FunctionCallCPInstruction.processInstruction(FunctionCallCPInstruction.java:177)
        ... 19 more
Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error 
in for program block generated from for statement block between lines 131 and 
269 -- Error evaluating for program block
        at 
org.apache.sysml.runtime.controlprogram.ForProgramBlock.execute(ForProgramBlock.java:162)
        at 
org.apache.sysml.runtime.controlprogram.FunctionProgramBlock.execute(FunctionProgramBlock.java:114)
        ... 20 more
Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error 
in for program block generated from for statement block between lines 132 and 
244 -- Error evaluating for program block
        at 
org.apache.sysml.runtime.controlprogram.ForProgramBlock.execute(ForProgramBlock.java:162)
        at 
org.apache.sysml.runtime.controlprogram.ForProgramBlock.execute(ForProgramBlock.java:150)
        ... 21 more
Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error 
in program block generated from statement block between lines 157 and 217 -- 
Error evaluating instruction: 
CP°r'°outc3p·MATRIX·DOUBLE°_mVar1077501·MATRIX·DOUBLE°48
        at 
org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:335)
        at 
org.apache.sysml.runtime.controlprogram.ProgramBlock.executeInstructions(ProgramBlock.java:224)
        at 
org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramBlock.java:168)
        at 
org.apache.sysml.runtime.controlprogram.ForProgramBlock.execute(ForProgramBlock.java:150)
        ... 22 more
Caused by: org.apache.sysml.runtime.controlprogram.caching.CacheException: 
Eviction to local path 
/tmp/systemml/_p6456_10.168.31.80//cache/cache000546482.dat (_mVar1077501) 
failed.
        at 
org.apache.sysml.runtime.controlprogram.caching.CacheableData.release(CacheableData.java:651)
        at 
org.apache.sysml.runtime.controlprogram.context.ExecutionContext.setMatrixOutput(ExecutionContext.java:426)
        at 
org.apache.sysml.runtime.instructions.cp.ReorgCPInstruction.processInstruction(ReorgCPInstruction.java:135)
        at 
org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:305)
        ... 25 more
Caused by: java.io.IOException: Failed to serialize cache block.
        at 
org.apache.sysml.runtime.controlprogram.caching.ByteBuffer.serializeBlock(ByteBuffer.java:82)
        at 
org.apache.sysml.runtime.controlprogram.caching.LazyWriteBuffer.writeBlock(LazyWriteBuffer.java:113)
        at 
org.apache.sysml.runtime.controlprogram.caching.CacheableData.release(CacheableData.java:647)
        ... 28 more
Caused by: java.io.IOException: Invalid number of serialized non-zeros: 842 
(expected: 2044)
        at 
org.apache.sysml.runtime.matrix.data.MatrixBlock.writeSparseToUltraSparse(MatrixBlock.java:2208)
        at 
org.apache.sysml.runtime.matrix.data.MatrixBlock.write(MatrixBlock.java:2073)
        at 
org.apache.sysml.runtime.controlprogram.caching.ByteBuffer.serializeBlock(ByteBuffer.java:73)
        ... 30 more
{code}

  was:
Randomly during training of a model, the following error will occur.  One 
possibility is that it only occurs when the ultra sparse format is used.  I 
don't yet have a reproducible example.

{code}
Caused by: org.apache.sysml.api.mlcontext.MLContextException: Exception 
occurred while executing runtime program
        at 
org.apache.sysml.api.mlcontext.ScriptExecutor.executeRuntimeProgram(ScriptExecutor.java:377)
        at 
org.apache.sysml.api.mlcontext.ScriptExecutor.execute(ScriptExecutor.java:320)
        at org.apache.sysml.api.mlcontext.MLContext.execute(MLContext.java:287)
        ... 11 more
Caused by: org.apache.sysml.runtime.DMLRuntimeException: 
org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error in while 
program block generated from while statement block between lines 17 and 45 -- 
Error evaluating while program block
        at 
org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:152)
        at 
org.apache.sysml.api.mlcontext.ScriptExecutor.executeRuntimeProgram(ScriptExecutor.java:375)
        ... 13 more
Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error 
in while program block generated from while statement block between lines 17 
and 45 -- Error evaluating while program block
        at 
org.apache.sysml.runtime.controlprogram.WhileProgramBlock.execute(WhileProgramBlock.java:181)
        at 
org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:145)
        ... 14 more
Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error 
in program block generated from statement block between lines 32 and 32 -- 
Error evaluating instruction: 
CP°extfunct°./mnist_lenet.dml°train°14°10°X·MATRIX·DOUBLE°Y·MATRIX·DOUBLE°X_val·MATRIX·DOUBLE°Y_val·MATRIX·DOUBLE°C·SCALAR·DOUBLE·false°Hin·SCALAR·DOUBLE·false°Win·SCALAR·DOUBLE·false°lr·SCALAR·DOUBLE·false°mu·SCALAR·DOUBLE·false°decay·SCALAR·DOUBLE·false°lambda·SCALAR·DOUBLE·false°50·SCALAR·INT·true°1·SCALAR·INT·true°iters·SCALAR·DOUBLE·false°Wc1°bc1°Wc2°bc2°Wc3°bc3°Wa1°ba1°Wa2°ba2
        at 
org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:335)
        at 
org.apache.sysml.runtime.controlprogram.ProgramBlock.executeInstructions(ProgramBlock.java:224)
        at 
org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramBlock.java:168)
        at 
org.apache.sysml.runtime.controlprogram.WhileProgramBlock.execute(WhileProgramBlock.java:169)
        ... 15 more
Caused by: org.apache.sysml.runtime.DMLRuntimeException: error executing 
function ./mnist_lenet.dml::train
        at 
org.apache.sysml.runtime.instructions.cp.FunctionCallCPInstruction.processInstruction(FunctionCallCPInstruction.java:184)
        at 
org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:305)
        ... 18 more
Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error 
in function program block generated from function statement block between lines 
38 and 270 -- Error evaluating function program block
        at 
org.apache.sysml.runtime.controlprogram.FunctionProgramBlock.execute(FunctionProgramBlock.java:121)
        at 
org.apache.sysml.runtime.instructions.cp.FunctionCallCPInstruction.processInstruction(FunctionCallCPInstruction.java:177)
        ... 19 more
Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error 
in for program block generated from for statement block between lines 131 and 
269 -- Error evaluating for program block
        at 
org.apache.sysml.runtime.controlprogram.ForProgramBlock.execute(ForProgramBlock.java:162)
        at 
org.apache.sysml.runtime.controlprogram.FunctionProgramBlock.execute(FunctionProgramBlock.java:114)
        ... 20 more
Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error 
in for program block generated from for statement block between lines 132 and 
244 -- Error evaluating for program block
        at 
org.apache.sysml.runtime.controlprogram.ForProgramBlock.execute(ForProgramBlock.java:162)
        at 
org.apache.sysml.runtime.controlprogram.ForProgramBlock.execute(ForProgramBlock.java:150)
        ... 21 more
Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error 
in program block generated from statement block between lines 157 and 217 -- 
Error evaluating instruction: 
CP°r'°outc3p·MATRIX·DOUBLE°_mVar1077501·MATRIX·DOUBLE°48
        at 
org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:335)
        at 
org.apache.sysml.runtime.controlprogram.ProgramBlock.executeInstructions(ProgramBlock.java:224)
        at 
org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramBlock.java:168)
        at 
org.apache.sysml.runtime.controlprogram.ForProgramBlock.execute(ForProgramBlock.java:150)
        ... 22 more
Caused by: org.apache.sysml.runtime.controlprogram.caching.CacheException: 
Eviction to local path 
/tmp/systemml/_p6456_10.168.31.80//cache/cache000546482.dat (_mVar1077501) 
failed.
        at 
org.apache.sysml.runtime.controlprogram.caching.CacheableData.release(CacheableData.java:651)
        at 
org.apache.sysml.runtime.controlprogram.context.ExecutionContext.setMatrixOutput(ExecutionContext.java:426)
        at 
org.apache.sysml.runtime.instructions.cp.ReorgCPInstruction.processInstruction(ReorgCPInstruction.java:135)
        at 
org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:305)
        ... 25 more
Caused by: java.io.IOException: Failed to serialize cache block.
        at 
org.apache.sysml.runtime.controlprogram.caching.ByteBuffer.serializeBlock(ByteBuffer.java:82)
        at 
org.apache.sysml.runtime.controlprogram.caching.LazyWriteBuffer.writeBlock(LazyWriteBuffer.java:113)
        at 
org.apache.sysml.runtime.controlprogram.caching.CacheableData.release(CacheableData.java:647)
        ... 28 more
Caused by: java.io.IOException: Invalid number of serialized non-zeros: 842 
(expected: 2044)
        at 
org.apache.sysml.runtime.matrix.data.MatrixBlock.writeSparseToUltraSparse(MatrixBlock.java:2208)
        at 
org.apache.sysml.runtime.matrix.data.MatrixBlock.write(MatrixBlock.java:2073)
        at 
org.apache.sysml.runtime.controlprogram.caching.ByteBuffer.serializeBlock(ByteBuffer.java:73)
        ... 30 more
{code}


> Ultra Sparse Invalid number of serialized non-zeros
> ---------------------------------------------------
>
>                 Key: SYSTEMML-1078
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-1078
>             Project: SystemML
>          Issue Type: Bug
>            Reporter: Mike Dusenberry
>
> Randomly during training of a model, the following error will occur.  It 
> appears that during the course of training, the characteristics of the 
> intermediate matrices can change, and if one of them becomes sparse enough to 
> fall into the "Ultra Sparse" category, an internal compiler error is 
> encountered in which the *true* and *expected* number of non-zeros diverge.
> {code}
> Caused by: org.apache.sysml.api.mlcontext.MLContextException: Exception 
> occurred while executing runtime program
>       at 
> org.apache.sysml.api.mlcontext.ScriptExecutor.executeRuntimeProgram(ScriptExecutor.java:377)
>       at 
> org.apache.sysml.api.mlcontext.ScriptExecutor.execute(ScriptExecutor.java:320)
>       at org.apache.sysml.api.mlcontext.MLContext.execute(MLContext.java:287)
>       ... 11 more
> Caused by: org.apache.sysml.runtime.DMLRuntimeException: 
> org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error in while 
> program block generated from while statement block between lines 17 and 45 -- 
> Error evaluating while program block
>       at 
> org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:152)
>       at 
> org.apache.sysml.api.mlcontext.ScriptExecutor.executeRuntimeProgram(ScriptExecutor.java:375)
>       ... 13 more
> Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error 
> in while program block generated from while statement block between lines 17 
> and 45 -- Error evaluating while program block
>       at 
> org.apache.sysml.runtime.controlprogram.WhileProgramBlock.execute(WhileProgramBlock.java:181)
>       at 
> org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:145)
>       ... 14 more
> Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error 
> in program block generated from statement block between lines 32 and 32 -- 
> Error evaluating instruction: 
> CP°extfunct°./mnist_lenet.dml°train°14°10°X·MATRIX·DOUBLE°Y·MATRIX·DOUBLE°X_val·MATRIX·DOUBLE°Y_val·MATRIX·DOUBLE°C·SCALAR·DOUBLE·false°Hin·SCALAR·DOUBLE·false°Win·SCALAR·DOUBLE·false°lr·SCALAR·DOUBLE·false°mu·SCALAR·DOUBLE·false°decay·SCALAR·DOUBLE·false°lambda·SCALAR·DOUBLE·false°50·SCALAR·INT·true°1·SCALAR·INT·true°iters·SCALAR·DOUBLE·false°Wc1°bc1°Wc2°bc2°Wc3°bc3°Wa1°ba1°Wa2°ba2
>       at 
> org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:335)
>       at 
> org.apache.sysml.runtime.controlprogram.ProgramBlock.executeInstructions(ProgramBlock.java:224)
>       at 
> org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramBlock.java:168)
>       at 
> org.apache.sysml.runtime.controlprogram.WhileProgramBlock.execute(WhileProgramBlock.java:169)
>       ... 15 more
> Caused by: org.apache.sysml.runtime.DMLRuntimeException: error executing 
> function ./mnist_lenet.dml::train
>       at 
> org.apache.sysml.runtime.instructions.cp.FunctionCallCPInstruction.processInstruction(FunctionCallCPInstruction.java:184)
>       at 
> org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:305)
>       ... 18 more
> Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error 
> in function program block generated from function statement block between 
> lines 38 and 270 -- Error evaluating function program block
>       at 
> org.apache.sysml.runtime.controlprogram.FunctionProgramBlock.execute(FunctionProgramBlock.java:121)
>       at 
> org.apache.sysml.runtime.instructions.cp.FunctionCallCPInstruction.processInstruction(FunctionCallCPInstruction.java:177)
>       ... 19 more
> Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error 
> in for program block generated from for statement block between lines 131 and 
> 269 -- Error evaluating for program block
>       at 
> org.apache.sysml.runtime.controlprogram.ForProgramBlock.execute(ForProgramBlock.java:162)
>       at 
> org.apache.sysml.runtime.controlprogram.FunctionProgramBlock.execute(FunctionProgramBlock.java:114)
>       ... 20 more
> Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error 
> in for program block generated from for statement block between lines 132 and 
> 244 -- Error evaluating for program block
>       at 
> org.apache.sysml.runtime.controlprogram.ForProgramBlock.execute(ForProgramBlock.java:162)
>       at 
> org.apache.sysml.runtime.controlprogram.ForProgramBlock.execute(ForProgramBlock.java:150)
>       ... 21 more
> Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error 
> in program block generated from statement block between lines 157 and 217 -- 
> Error evaluating instruction: 
> CP°r'°outc3p·MATRIX·DOUBLE°_mVar1077501·MATRIX·DOUBLE°48
>       at 
> org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:335)
>       at 
> org.apache.sysml.runtime.controlprogram.ProgramBlock.executeInstructions(ProgramBlock.java:224)
>       at 
> org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramBlock.java:168)
>       at 
> org.apache.sysml.runtime.controlprogram.ForProgramBlock.execute(ForProgramBlock.java:150)
>       ... 22 more
> Caused by: org.apache.sysml.runtime.controlprogram.caching.CacheException: 
> Eviction to local path 
> /tmp/systemml/_p6456_10.168.31.80//cache/cache000546482.dat (_mVar1077501) 
> failed.
>       at 
> org.apache.sysml.runtime.controlprogram.caching.CacheableData.release(CacheableData.java:651)
>       at 
> org.apache.sysml.runtime.controlprogram.context.ExecutionContext.setMatrixOutput(ExecutionContext.java:426)
>       at 
> org.apache.sysml.runtime.instructions.cp.ReorgCPInstruction.processInstruction(ReorgCPInstruction.java:135)
>       at 
> org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:305)
>       ... 25 more
> Caused by: java.io.IOException: Failed to serialize cache block.
>       at 
> org.apache.sysml.runtime.controlprogram.caching.ByteBuffer.serializeBlock(ByteBuffer.java:82)
>       at 
> org.apache.sysml.runtime.controlprogram.caching.LazyWriteBuffer.writeBlock(LazyWriteBuffer.java:113)
>       at 
> org.apache.sysml.runtime.controlprogram.caching.CacheableData.release(CacheableData.java:647)
>       ... 28 more
> Caused by: java.io.IOException: Invalid number of serialized non-zeros: 842 
> (expected: 2044)
>       at 
> org.apache.sysml.runtime.matrix.data.MatrixBlock.writeSparseToUltraSparse(MatrixBlock.java:2208)
>       at 
> org.apache.sysml.runtime.matrix.data.MatrixBlock.write(MatrixBlock.java:2073)
>       at 
> org.apache.sysml.runtime.controlprogram.caching.ByteBuffer.serializeBlock(ByteBuffer.java:73)
>       ... 30 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to