[
https://issues.apache.org/jira/browse/SYSTEMML-1078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matthias Boehm closed SYSTEMML-1078.
------------------------------------
Resolution: Fixed
Assignee: Matthias Boehm
Fix Version/s: SystemML 1.1
I'm closing this issue as there is no reproducible scenario and this issue has
likely been fixed with the recent sparse block fixes SYSTEMML-1959,
SYSTEMML-2035, SYSTEMML-2051, SYSTEMML-2052, and SYSTEMML-2098.
> Ultra Sparse Invalid number of serialized non-zeros
> ---------------------------------------------------
>
> Key: SYSTEMML-1078
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1078
> Project: SystemML
> Issue Type: Bug
> Reporter: Mike Dusenberry
> Assignee: Matthias Boehm
> Priority: Blocker
> Fix For: SystemML 1.1
>
>
> Randomly during training of a model, the following error will occur. It
> appears that during the course of training, the characteristics of the
> intermediate matrices can change, and if one of them becomes sparse enough to
> fall into the "Ultra Sparse" category, an internal compiler error is
> encountered in which the *true* and *expected* number of non-zeros diverge.
> {code}
> Caused by: org.apache.sysml.api.mlcontext.MLContextException: Exception
> occurred while executing runtime program
> at
> org.apache.sysml.api.mlcontext.ScriptExecutor.executeRuntimeProgram(ScriptExecutor.java:377)
> at
> org.apache.sysml.api.mlcontext.ScriptExecutor.execute(ScriptExecutor.java:320)
> at org.apache.sysml.api.mlcontext.MLContext.execute(MLContext.java:287)
> ... 11 more
> Caused by: org.apache.sysml.runtime.DMLRuntimeException:
> org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error in while
> program block generated from while statement block between lines 17 and 45 --
> Error evaluating while program block
> at
> org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:152)
> at
> org.apache.sysml.api.mlcontext.ScriptExecutor.executeRuntimeProgram(ScriptExecutor.java:375)
> ... 13 more
> Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error
> in while program block generated from while statement block between lines 17
> and 45 -- Error evaluating while program block
> at
> org.apache.sysml.runtime.controlprogram.WhileProgramBlock.execute(WhileProgramBlock.java:181)
> at
> org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:145)
> ... 14 more
> Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error
> in program block generated from statement block between lines 32 and 32 --
> Error evaluating instruction:
> CP°extfunct°./mnist_lenet.dml°train°14°10°X·MATRIX·DOUBLE°Y·MATRIX·DOUBLE°X_val·MATRIX·DOUBLE°Y_val·MATRIX·DOUBLE°C·SCALAR·DOUBLE·false°Hin·SCALAR·DOUBLE·false°Win·SCALAR·DOUBLE·false°lr·SCALAR·DOUBLE·false°mu·SCALAR·DOUBLE·false°decay·SCALAR·DOUBLE·false°lambda·SCALAR·DOUBLE·false°50·SCALAR·INT·true°1·SCALAR·INT·true°iters·SCALAR·DOUBLE·false°Wc1°bc1°Wc2°bc2°Wc3°bc3°Wa1°ba1°Wa2°ba2
> at
> org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:335)
> at
> org.apache.sysml.runtime.controlprogram.ProgramBlock.executeInstructions(ProgramBlock.java:224)
> at
> org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramBlock.java:168)
> at
> org.apache.sysml.runtime.controlprogram.WhileProgramBlock.execute(WhileProgramBlock.java:169)
> ... 15 more
> Caused by: org.apache.sysml.runtime.DMLRuntimeException: error executing
> function ./mnist_lenet.dml::train
> at
> org.apache.sysml.runtime.instructions.cp.FunctionCallCPInstruction.processInstruction(FunctionCallCPInstruction.java:184)
> at
> org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:305)
> ... 18 more
> Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error
> in function program block generated from function statement block between
> lines 38 and 270 -- Error evaluating function program block
> at
> org.apache.sysml.runtime.controlprogram.FunctionProgramBlock.execute(FunctionProgramBlock.java:121)
> at
> org.apache.sysml.runtime.instructions.cp.FunctionCallCPInstruction.processInstruction(FunctionCallCPInstruction.java:177)
> ... 19 more
> Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error
> in for program block generated from for statement block between lines 131 and
> 269 -- Error evaluating for program block
> at
> org.apache.sysml.runtime.controlprogram.ForProgramBlock.execute(ForProgramBlock.java:162)
> at
> org.apache.sysml.runtime.controlprogram.FunctionProgramBlock.execute(FunctionProgramBlock.java:114)
> ... 20 more
> Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error
> in for program block generated from for statement block between lines 132 and
> 244 -- Error evaluating for program block
> at
> org.apache.sysml.runtime.controlprogram.ForProgramBlock.execute(ForProgramBlock.java:162)
> at
> org.apache.sysml.runtime.controlprogram.ForProgramBlock.execute(ForProgramBlock.java:150)
> ... 21 more
> Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error
> in program block generated from statement block between lines 157 and 217 --
> Error evaluating instruction:
> CP°r'°outc3p·MATRIX·DOUBLE°_mVar1077501·MATRIX·DOUBLE°48
> at
> org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:335)
> at
> org.apache.sysml.runtime.controlprogram.ProgramBlock.executeInstructions(ProgramBlock.java:224)
> at
> org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramBlock.java:168)
> at
> org.apache.sysml.runtime.controlprogram.ForProgramBlock.execute(ForProgramBlock.java:150)
> ... 22 more
> Caused by: org.apache.sysml.runtime.controlprogram.caching.CacheException:
> Eviction to local path
> /tmp/systemml/_p6456_10.168.31.80//cache/cache000546482.dat (_mVar1077501)
> failed.
> at
> org.apache.sysml.runtime.controlprogram.caching.CacheableData.release(CacheableData.java:651)
> at
> org.apache.sysml.runtime.controlprogram.context.ExecutionContext.setMatrixOutput(ExecutionContext.java:426)
> at
> org.apache.sysml.runtime.instructions.cp.ReorgCPInstruction.processInstruction(ReorgCPInstruction.java:135)
> at
> org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:305)
> ... 25 more
> Caused by: java.io.IOException: Failed to serialize cache block.
> at
> org.apache.sysml.runtime.controlprogram.caching.ByteBuffer.serializeBlock(ByteBuffer.java:82)
> at
> org.apache.sysml.runtime.controlprogram.caching.LazyWriteBuffer.writeBlock(LazyWriteBuffer.java:113)
> at
> org.apache.sysml.runtime.controlprogram.caching.CacheableData.release(CacheableData.java:647)
> ... 28 more
> Caused by: java.io.IOException: Invalid number of serialized non-zeros: 842
> (expected: 2044)
> at
> org.apache.sysml.runtime.matrix.data.MatrixBlock.writeSparseToUltraSparse(MatrixBlock.java:2208)
> at
> org.apache.sysml.runtime.matrix.data.MatrixBlock.write(MatrixBlock.java:2073)
> at
> org.apache.sysml.runtime.controlprogram.caching.ByteBuffer.serializeBlock(ByteBuffer.java:73)
> ... 30 more
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)