Mike Dusenberry created SYSTEMML-1078:
-----------------------------------------

             Summary: Ultra Sparse Invalid number of serialized non-zeros
                 Key: SYSTEMML-1078
                 URL: https://issues.apache.org/jira/browse/SYSTEMML-1078
             Project: SystemML
          Issue Type: Bug
            Reporter: Mike Dusenberry


Randomly during training of a model, the following error will occur.  One 
possibility is that it only occurs when the ultra sparse format is used.

{code}
Caused by: org.apache.sysml.api.mlcontext.MLContextException: Exception 
occurred while executing runtime program
        at 
org.apache.sysml.api.mlcontext.ScriptExecutor.executeRuntimeProgram(ScriptExecutor.java:377)
        at 
org.apache.sysml.api.mlcontext.ScriptExecutor.execute(ScriptExecutor.java:320)
        at org.apache.sysml.api.mlcontext.MLContext.execute(MLContext.java:287)
        ... 11 more
Caused by: org.apache.sysml.runtime.DMLRuntimeException: 
org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error in while 
program block generated from while statement block between lines 17 and 45 -- 
Error evaluating while program block
        at 
org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:152)
        at 
org.apache.sysml.api.mlcontext.ScriptExecutor.executeRuntimeProgram(ScriptExecutor.java:375)
        ... 13 more
Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error 
in while program block generated from while statement block between lines 17 
and 45 -- Error evaluating while program block
        at 
org.apache.sysml.runtime.controlprogram.WhileProgramBlock.execute(WhileProgramBlock.java:181)
        at 
org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:145)
        ... 14 more
Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error 
in program block generated from statement block between lines 32 and 32 -- 
Error evaluating instruction: 
CP°extfunct°./mnist_lenet.dml°train°14°10°X·MATRIX·DOUBLE°Y·MATRIX·DOUBLE°X_val·MATRIX·DOUBLE°Y_val·MATRIX·DOUBLE°C·SCALAR·DOUBLE·false°Hin·SCALAR·DOUBLE·false°Win·SCALAR·DOUBLE·false°lr·SCALAR·DOUBLE·false°mu·SCALAR·DOUBLE·false°decay·SCALAR·DOUBLE·false°lambda·SCALAR·DOUBLE·false°50·SCALAR·INT·true°1·SCALAR·INT·true°iters·SCALAR·DOUBLE·false°Wc1°bc1°Wc2°bc2°Wc3°bc3°Wa1°ba1°Wa2°ba2
        at 
org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:335)
        at 
org.apache.sysml.runtime.controlprogram.ProgramBlock.executeInstructions(ProgramBlock.java:224)
        at 
org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramBlock.java:168)
        at 
org.apache.sysml.runtime.controlprogram.WhileProgramBlock.execute(WhileProgramBlock.java:169)
        ... 15 more
Caused by: org.apache.sysml.runtime.DMLRuntimeException: error executing 
function ./mnist_lenet.dml::train
        at 
org.apache.sysml.runtime.instructions.cp.FunctionCallCPInstruction.processInstruction(FunctionCallCPInstruction.java:184)
        at 
org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:305)
        ... 18 more
Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error 
in function program block generated from function statement block between lines 
38 and 270 -- Error evaluating function program block
        at 
org.apache.sysml.runtime.controlprogram.FunctionProgramBlock.execute(FunctionProgramBlock.java:121)
        at 
org.apache.sysml.runtime.instructions.cp.FunctionCallCPInstruction.processInstruction(FunctionCallCPInstruction.java:177)
        ... 19 more
Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error 
in for program block generated from for statement block between lines 131 and 
269 -- Error evaluating for program block
        at 
org.apache.sysml.runtime.controlprogram.ForProgramBlock.execute(ForProgramBlock.java:162)
        at 
org.apache.sysml.runtime.controlprogram.FunctionProgramBlock.execute(FunctionProgramBlock.java:114)
        ... 20 more
Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error 
in for program block generated from for statement block between lines 132 and 
244 -- Error evaluating for program block
        at 
org.apache.sysml.runtime.controlprogram.ForProgramBlock.execute(ForProgramBlock.java:162)
        at 
org.apache.sysml.runtime.controlprogram.ForProgramBlock.execute(ForProgramBlock.java:150)
        ... 21 more
Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error 
in program block generated from statement block between lines 157 and 217 -- 
Error evaluating instruction: 
CP°r'°outc3p·MATRIX·DOUBLE°_mVar1077501·MATRIX·DOUBLE°48
        at 
org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:335)
        at 
org.apache.sysml.runtime.controlprogram.ProgramBlock.executeInstructions(ProgramBlock.java:224)
        at 
org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramBlock.java:168)
        at 
org.apache.sysml.runtime.controlprogram.ForProgramBlock.execute(ForProgramBlock.java:150)
        ... 22 more
Caused by: org.apache.sysml.runtime.controlprogram.caching.CacheException: 
Eviction to local path 
/tmp/systemml/_p6456_10.168.31.80//cache/cache000546482.dat (_mVar1077501) 
failed.
        at 
org.apache.sysml.runtime.controlprogram.caching.CacheableData.release(CacheableData.java:651)
        at 
org.apache.sysml.runtime.controlprogram.context.ExecutionContext.setMatrixOutput(ExecutionContext.java:426)
        at 
org.apache.sysml.runtime.instructions.cp.ReorgCPInstruction.processInstruction(ReorgCPInstruction.java:135)
        at 
org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:305)
        ... 25 more
Caused by: java.io.IOException: Failed to serialize cache block.
        at 
org.apache.sysml.runtime.controlprogram.caching.ByteBuffer.serializeBlock(ByteBuffer.java:82)
        at 
org.apache.sysml.runtime.controlprogram.caching.LazyWriteBuffer.writeBlock(LazyWriteBuffer.java:113)
        at 
org.apache.sysml.runtime.controlprogram.caching.CacheableData.release(CacheableData.java:647)
        ... 28 more
Caused by: java.io.IOException: Invalid number of serialized non-zeros: 842 
(expected: 2044)
        at 
org.apache.sysml.runtime.matrix.data.MatrixBlock.writeSparseToUltraSparse(MatrixBlock.java:2208)
        at 
org.apache.sysml.runtime.matrix.data.MatrixBlock.write(MatrixBlock.java:2073)
        at 
org.apache.sysml.runtime.controlprogram.caching.ByteBuffer.serializeBlock(ByteBuffer.java:73)
        ... 30 more
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to