Mike Dusenberry created SYSTEMML-1078:
-----------------------------------------
Summary: Ultra Sparse Invalid number of serialized non-zeros
Key: SYSTEMML-1078
URL: https://issues.apache.org/jira/browse/SYSTEMML-1078
Project: SystemML
Issue Type: Bug
Reporter: Mike Dusenberry
Randomly during training of a model, the following error will occur. One
possibility is that it only occurs when the ultra sparse format is used.
{code}
Caused by: org.apache.sysml.api.mlcontext.MLContextException: Exception
occurred while executing runtime program
at
org.apache.sysml.api.mlcontext.ScriptExecutor.executeRuntimeProgram(ScriptExecutor.java:377)
at
org.apache.sysml.api.mlcontext.ScriptExecutor.execute(ScriptExecutor.java:320)
at org.apache.sysml.api.mlcontext.MLContext.execute(MLContext.java:287)
... 11 more
Caused by: org.apache.sysml.runtime.DMLRuntimeException:
org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error in while
program block generated from while statement block between lines 17 and 45 --
Error evaluating while program block
at
org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:152)
at
org.apache.sysml.api.mlcontext.ScriptExecutor.executeRuntimeProgram(ScriptExecutor.java:375)
... 13 more
Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error
in while program block generated from while statement block between lines 17
and 45 -- Error evaluating while program block
at
org.apache.sysml.runtime.controlprogram.WhileProgramBlock.execute(WhileProgramBlock.java:181)
at
org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:145)
... 14 more
Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error
in program block generated from statement block between lines 32 and 32 --
Error evaluating instruction:
CP°extfunct°./mnist_lenet.dml°train°14°10°X·MATRIX·DOUBLE°Y·MATRIX·DOUBLE°X_val·MATRIX·DOUBLE°Y_val·MATRIX·DOUBLE°C·SCALAR·DOUBLE·false°Hin·SCALAR·DOUBLE·false°Win·SCALAR·DOUBLE·false°lr·SCALAR·DOUBLE·false°mu·SCALAR·DOUBLE·false°decay·SCALAR·DOUBLE·false°lambda·SCALAR·DOUBLE·false°50·SCALAR·INT·true°1·SCALAR·INT·true°iters·SCALAR·DOUBLE·false°Wc1°bc1°Wc2°bc2°Wc3°bc3°Wa1°ba1°Wa2°ba2
at
org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:335)
at
org.apache.sysml.runtime.controlprogram.ProgramBlock.executeInstructions(ProgramBlock.java:224)
at
org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramBlock.java:168)
at
org.apache.sysml.runtime.controlprogram.WhileProgramBlock.execute(WhileProgramBlock.java:169)
... 15 more
Caused by: org.apache.sysml.runtime.DMLRuntimeException: error executing
function ./mnist_lenet.dml::train
at
org.apache.sysml.runtime.instructions.cp.FunctionCallCPInstruction.processInstruction(FunctionCallCPInstruction.java:184)
at
org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:305)
... 18 more
Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error
in function program block generated from function statement block between lines
38 and 270 -- Error evaluating function program block
at
org.apache.sysml.runtime.controlprogram.FunctionProgramBlock.execute(FunctionProgramBlock.java:121)
at
org.apache.sysml.runtime.instructions.cp.FunctionCallCPInstruction.processInstruction(FunctionCallCPInstruction.java:177)
... 19 more
Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error
in for program block generated from for statement block between lines 131 and
269 -- Error evaluating for program block
at
org.apache.sysml.runtime.controlprogram.ForProgramBlock.execute(ForProgramBlock.java:162)
at
org.apache.sysml.runtime.controlprogram.FunctionProgramBlock.execute(FunctionProgramBlock.java:114)
... 20 more
Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error
in for program block generated from for statement block between lines 132 and
244 -- Error evaluating for program block
at
org.apache.sysml.runtime.controlprogram.ForProgramBlock.execute(ForProgramBlock.java:162)
at
org.apache.sysml.runtime.controlprogram.ForProgramBlock.execute(ForProgramBlock.java:150)
... 21 more
Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error
in program block generated from statement block between lines 157 and 217 --
Error evaluating instruction:
CP°r'°outc3p·MATRIX·DOUBLE°_mVar1077501·MATRIX·DOUBLE°48
at
org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:335)
at
org.apache.sysml.runtime.controlprogram.ProgramBlock.executeInstructions(ProgramBlock.java:224)
at
org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramBlock.java:168)
at
org.apache.sysml.runtime.controlprogram.ForProgramBlock.execute(ForProgramBlock.java:150)
... 22 more
Caused by: org.apache.sysml.runtime.controlprogram.caching.CacheException:
Eviction to local path
/tmp/systemml/_p6456_10.168.31.80//cache/cache000546482.dat (_mVar1077501)
failed.
at
org.apache.sysml.runtime.controlprogram.caching.CacheableData.release(CacheableData.java:651)
at
org.apache.sysml.runtime.controlprogram.context.ExecutionContext.setMatrixOutput(ExecutionContext.java:426)
at
org.apache.sysml.runtime.instructions.cp.ReorgCPInstruction.processInstruction(ReorgCPInstruction.java:135)
at
org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:305)
... 25 more
Caused by: java.io.IOException: Failed to serialize cache block.
at
org.apache.sysml.runtime.controlprogram.caching.ByteBuffer.serializeBlock(ByteBuffer.java:82)
at
org.apache.sysml.runtime.controlprogram.caching.LazyWriteBuffer.writeBlock(LazyWriteBuffer.java:113)
at
org.apache.sysml.runtime.controlprogram.caching.CacheableData.release(CacheableData.java:647)
... 28 more
Caused by: java.io.IOException: Invalid number of serialized non-zeros: 842
(expected: 2044)
at
org.apache.sysml.runtime.matrix.data.MatrixBlock.writeSparseToUltraSparse(MatrixBlock.java:2208)
at
org.apache.sysml.runtime.matrix.data.MatrixBlock.write(MatrixBlock.java:2073)
at
org.apache.sysml.runtime.controlprogram.caching.ByteBuffer.serializeBlock(ByteBuffer.java:73)
... 30 more
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)