[ 
https://issues.apache.org/jira/browse/SYSTEMML-413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15396904#comment-15396904
 ] 

Matthias Boehm commented on SYSTEMML-413:
-----------------------------------------

[~freiss] that's a good start - a couple of additions:

1) Input/output: all the readers/writers are in 'org.apache.sysml.runtime.io' - 
similar to the new frame readers and writers, the existing sequential and 
parallel readers should be consolidated too. Casting functionality and 
conversion to/from external representations can be found in 
org.apache.sysml.runtime.util.DataConverter.
2) Operation libraries: Some of the performance-critical code is in our 
LibMatrix* classes. I would like to keep them, especially LibMatrixMult, 
LibMatrixDatagen, LibMatixReorg, LibMatrixBincell, and LibMatrixAgg isolated as 
they are already quite large in code size.
3) Frames: One thing to keep in mind is that the buffer pool and some other 
places are implemented in a generic manner against CacheBlocks with MatrixBlock 
and FrameBlock implementing this abstraction. Any refactoring would need to 
consider this.

> Runtime refactoring core matrix block library
> ---------------------------------------------
>
>                 Key: SYSTEMML-413
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-413
>             Project: SystemML
>          Issue Type: Task
>          Components: Runtime
>            Reporter: Matthias Boehm
>
> Pull the local (non-distributed) linear algebra components of SystemML into a 
> separate package. Define a proper object-oriented Java API for creating and 
> manipulating local matrices. Document this API. Refactor all tests of local 
> linear algebra functionality so that those tests use the new API. Refactor 
> the distributed linear algebra operators (both Spark and Hadoop map-reduce) 
> to use the new APIs for local linear algebra. 
> *Overall Refactoring Plan*
> The MatrixBlock class will be the core locus of refactoring. The file is over 
> 6000 lines long, has dependencies on the HOPS and LOPS layers, and contains a 
> lot of sparse matrix code that really ought to be in SparseBlock. Even if 
> it’s modified in place, MatrixBlock will bear little resemblance to its 
> current form after the refactoring is completed. I recommend setting aside 
> the current MatrixBlock class and creating new classes with equivalent 
> functionality by copying appropriate blocks of code from the old class. 
> Major changes to make relative to MatrixBlock:
> * We should create a new DenseMatrixBlock class that only covers dense linear 
> algebra.
> * Sparse-specific code should be moved into the SparseBlock class. 
> * Common functionality across dense and sparse should go into the MatrixValue 
> superclass.
> * There should be a new class with a name like “Matrix” (we’ll need one 
> anyway to serve as the public API) that contains a pointer to a MatrixValue 
> and can switch between different representations. Ideally this class should 
> be designed so that, in the future, it can serve as a matrix ADT that will 
> wrap both local and distributed linear algebra.
> * Several fields (maxrow, maxcolumn, numGroups, and various estimates of 
> future numbers of nonzeros) are used for stashing data that is only for 
> internal SystemML use. Either put these into a different data structure or 
> provide a generic mechanism for tagging a matrix block with additional 
> application-specific data.
> * Clean up and simplify the multiple different initialization methods 
> (different variants of the constructors and the methods init() and reset()). 
> There should be one canonical method for each major type of initialization. 
> Other methods that are shortcuts (i.e. reset() with no arguments) should call 
> the canonical method internally.
> * Consider refactoring the variants of ternaryOperations() that support 
> ctable() into something simpler that is called ctable() – perhaps a Java API 
> that can take null values for the optional arguments. 
> Other changes outside MatrixBlock:
> * The matrix classes currently depend on Hadoop I/O classes like Writable and 
> DataInputBuffer. A local linear algebra library really shoudn’t require 
> Hadoop. I/O methods that use Hadoop APIs should be factored out into a 
> separate package. In paticular, MatrixValue needs to be separated from 
> Hadoop’s WritableComparable API.
> * The contents of the following packages need to move to the new library: 
> sysml.runtime.functionobjects and sysml.runtime.matrix.operators
> * The library will need local input and output functions. I haven’t found 
> suitable functions yet, but they may be hidden somewhere; in that case the 
> existing functions should be adjacent to the other local linear algebra code.
> * Utility functions under classes in sysml.runtime.util will need to be 
> replicated.
> * The more obscure subclasses of MatrixValue (MatrixCell, WeightedCell, etc.) 
> do NOT need to be moved over.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to