[ 
https://issues.apache.org/jira/browse/SYSTEMML-1078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm closed SYSTEMML-1078.
------------------------------------
       Resolution: Fixed
         Assignee: Matthias Boehm
    Fix Version/s: SystemML 1.1

I'm closing this issue as there is no reproducible scenario and this issue has 
likely been fixed with the recent sparse block fixes SYSTEMML-1959, 
SYSTEMML-2035, SYSTEMML-2051, SYSTEMML-2052, and SYSTEMML-2098.

> Ultra Sparse Invalid number of serialized non-zeros
> ---------------------------------------------------
>
>                 Key: SYSTEMML-1078
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-1078
>             Project: SystemML
>          Issue Type: Bug
>            Reporter: Mike Dusenberry
>            Assignee: Matthias Boehm
>            Priority: Blocker
>             Fix For: SystemML 1.1
>
>
> Randomly during training of a model, the following error will occur.  It 
> appears that during the course of training, the characteristics of the 
> intermediate matrices can change, and if one of them becomes sparse enough to 
> fall into the "Ultra Sparse" category, an internal compiler error is 
> encountered in which the *true* and *expected* number of non-zeros diverge.
> {code}
> Caused by: org.apache.sysml.api.mlcontext.MLContextException: Exception 
> occurred while executing runtime program
>       at 
> org.apache.sysml.api.mlcontext.ScriptExecutor.executeRuntimeProgram(ScriptExecutor.java:377)
>       at 
> org.apache.sysml.api.mlcontext.ScriptExecutor.execute(ScriptExecutor.java:320)
>       at org.apache.sysml.api.mlcontext.MLContext.execute(MLContext.java:287)
>       ... 11 more
> Caused by: org.apache.sysml.runtime.DMLRuntimeException: 
> org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error in while 
> program block generated from while statement block between lines 17 and 45 -- 
> Error evaluating while program block
>       at 
> org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:152)
>       at 
> org.apache.sysml.api.mlcontext.ScriptExecutor.executeRuntimeProgram(ScriptExecutor.java:375)
>       ... 13 more
> Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error 
> in while program block generated from while statement block between lines 17 
> and 45 -- Error evaluating while program block
>       at 
> org.apache.sysml.runtime.controlprogram.WhileProgramBlock.execute(WhileProgramBlock.java:181)
>       at 
> org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:145)
>       ... 14 more
> Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error 
> in program block generated from statement block between lines 32 and 32 -- 
> Error evaluating instruction: 
> CP°extfunct°./mnist_lenet.dml°train°14°10°X·MATRIX·DOUBLE°Y·MATRIX·DOUBLE°X_val·MATRIX·DOUBLE°Y_val·MATRIX·DOUBLE°C·SCALAR·DOUBLE·false°Hin·SCALAR·DOUBLE·false°Win·SCALAR·DOUBLE·false°lr·SCALAR·DOUBLE·false°mu·SCALAR·DOUBLE·false°decay·SCALAR·DOUBLE·false°lambda·SCALAR·DOUBLE·false°50·SCALAR·INT·true°1·SCALAR·INT·true°iters·SCALAR·DOUBLE·false°Wc1°bc1°Wc2°bc2°Wc3°bc3°Wa1°ba1°Wa2°ba2
>       at 
> org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:335)
>       at 
> org.apache.sysml.runtime.controlprogram.ProgramBlock.executeInstructions(ProgramBlock.java:224)
>       at 
> org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramBlock.java:168)
>       at 
> org.apache.sysml.runtime.controlprogram.WhileProgramBlock.execute(WhileProgramBlock.java:169)
>       ... 15 more
> Caused by: org.apache.sysml.runtime.DMLRuntimeException: error executing 
> function ./mnist_lenet.dml::train
>       at 
> org.apache.sysml.runtime.instructions.cp.FunctionCallCPInstruction.processInstruction(FunctionCallCPInstruction.java:184)
>       at 
> org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:305)
>       ... 18 more
> Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error 
> in function program block generated from function statement block between 
> lines 38 and 270 -- Error evaluating function program block
>       at 
> org.apache.sysml.runtime.controlprogram.FunctionProgramBlock.execute(FunctionProgramBlock.java:121)
>       at 
> org.apache.sysml.runtime.instructions.cp.FunctionCallCPInstruction.processInstruction(FunctionCallCPInstruction.java:177)
>       ... 19 more
> Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error 
> in for program block generated from for statement block between lines 131 and 
> 269 -- Error evaluating for program block
>       at 
> org.apache.sysml.runtime.controlprogram.ForProgramBlock.execute(ForProgramBlock.java:162)
>       at 
> org.apache.sysml.runtime.controlprogram.FunctionProgramBlock.execute(FunctionProgramBlock.java:114)
>       ... 20 more
> Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error 
> in for program block generated from for statement block between lines 132 and 
> 244 -- Error evaluating for program block
>       at 
> org.apache.sysml.runtime.controlprogram.ForProgramBlock.execute(ForProgramBlock.java:162)
>       at 
> org.apache.sysml.runtime.controlprogram.ForProgramBlock.execute(ForProgramBlock.java:150)
>       ... 21 more
> Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error 
> in program block generated from statement block between lines 157 and 217 -- 
> Error evaluating instruction: 
> CP°r'°outc3p·MATRIX·DOUBLE°_mVar1077501·MATRIX·DOUBLE°48
>       at 
> org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:335)
>       at 
> org.apache.sysml.runtime.controlprogram.ProgramBlock.executeInstructions(ProgramBlock.java:224)
>       at 
> org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramBlock.java:168)
>       at 
> org.apache.sysml.runtime.controlprogram.ForProgramBlock.execute(ForProgramBlock.java:150)
>       ... 22 more
> Caused by: org.apache.sysml.runtime.controlprogram.caching.CacheException: 
> Eviction to local path 
> /tmp/systemml/_p6456_10.168.31.80//cache/cache000546482.dat (_mVar1077501) 
> failed.
>       at 
> org.apache.sysml.runtime.controlprogram.caching.CacheableData.release(CacheableData.java:651)
>       at 
> org.apache.sysml.runtime.controlprogram.context.ExecutionContext.setMatrixOutput(ExecutionContext.java:426)
>       at 
> org.apache.sysml.runtime.instructions.cp.ReorgCPInstruction.processInstruction(ReorgCPInstruction.java:135)
>       at 
> org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:305)
>       ... 25 more
> Caused by: java.io.IOException: Failed to serialize cache block.
>       at 
> org.apache.sysml.runtime.controlprogram.caching.ByteBuffer.serializeBlock(ByteBuffer.java:82)
>       at 
> org.apache.sysml.runtime.controlprogram.caching.LazyWriteBuffer.writeBlock(LazyWriteBuffer.java:113)
>       at 
> org.apache.sysml.runtime.controlprogram.caching.CacheableData.release(CacheableData.java:647)
>       ... 28 more
> Caused by: java.io.IOException: Invalid number of serialized non-zeros: 842 
> (expected: 2044)
>       at 
> org.apache.sysml.runtime.matrix.data.MatrixBlock.writeSparseToUltraSparse(MatrixBlock.java:2208)
>       at 
> org.apache.sysml.runtime.matrix.data.MatrixBlock.write(MatrixBlock.java:2073)
>       at 
> org.apache.sysml.runtime.controlprogram.caching.ByteBuffer.serializeBlock(ByteBuffer.java:73)
>       ... 30 more
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to