[
https://issues.apache.org/jira/browse/SYSTEMML-413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15396904#comment-15396904
]
Matthias Boehm commented on SYSTEMML-413:
-----------------------------------------
[~freiss] that's a good start - a couple of additions:
1) Input/output: all the readers/writers are in 'org.apache.sysml.runtime.io' -
similar to the new frame readers and writers, the existing sequential and
parallel readers should be consolidated too. Casting functionality and
conversion to/from external representations can be found in
org.apache.sysml.runtime.util.DataConverter.
2) Operation libraries: Some of the performance-critical code is in our
LibMatrix* classes. I would like to keep them, especially LibMatrixMult,
LibMatrixDatagen, LibMatixReorg, LibMatrixBincell, and LibMatrixAgg isolated as
they are already quite large in code size.
3) Frames: One thing to keep in mind is that the buffer pool and some other
places are implemented in a generic manner against CacheBlocks with MatrixBlock
and FrameBlock implementing this abstraction. Any refactoring would need to
consider this.
> Runtime refactoring core matrix block library
> ---------------------------------------------
>
> Key: SYSTEMML-413
> URL: https://issues.apache.org/jira/browse/SYSTEMML-413
> Project: SystemML
> Issue Type: Task
> Components: Runtime
> Reporter: Matthias Boehm
>
> Pull the local (non-distributed) linear algebra components of SystemML into a
> separate package. Define a proper object-oriented Java API for creating and
> manipulating local matrices. Document this API. Refactor all tests of local
> linear algebra functionality so that those tests use the new API. Refactor
> the distributed linear algebra operators (both Spark and Hadoop map-reduce)
> to use the new APIs for local linear algebra.
> *Overall Refactoring Plan*
> The MatrixBlock class will be the core locus of refactoring. The file is over
> 6000 lines long, has dependencies on the HOPS and LOPS layers, and contains a
> lot of sparse matrix code that really ought to be in SparseBlock. Even if
> it’s modified in place, MatrixBlock will bear little resemblance to its
> current form after the refactoring is completed. I recommend setting aside
> the current MatrixBlock class and creating new classes with equivalent
> functionality by copying appropriate blocks of code from the old class.
> Major changes to make relative to MatrixBlock:
> * We should create a new DenseMatrixBlock class that only covers dense linear
> algebra.
> * Sparse-specific code should be moved into the SparseBlock class.
> * Common functionality across dense and sparse should go into the MatrixValue
> superclass.
> * There should be a new class with a name like “Matrix” (we’ll need one
> anyway to serve as the public API) that contains a pointer to a MatrixValue
> and can switch between different representations. Ideally this class should
> be designed so that, in the future, it can serve as a matrix ADT that will
> wrap both local and distributed linear algebra.
> * Several fields (maxrow, maxcolumn, numGroups, and various estimates of
> future numbers of nonzeros) are used for stashing data that is only for
> internal SystemML use. Either put these into a different data structure or
> provide a generic mechanism for tagging a matrix block with additional
> application-specific data.
> * Clean up and simplify the multiple different initialization methods
> (different variants of the constructors and the methods init() and reset()).
> There should be one canonical method for each major type of initialization.
> Other methods that are shortcuts (i.e. reset() with no arguments) should call
> the canonical method internally.
> * Consider refactoring the variants of ternaryOperations() that support
> ctable() into something simpler that is called ctable() – perhaps a Java API
> that can take null values for the optional arguments.
> Other changes outside MatrixBlock:
> * The matrix classes currently depend on Hadoop I/O classes like Writable and
> DataInputBuffer. A local linear algebra library really shoudn’t require
> Hadoop. I/O methods that use Hadoop APIs should be factored out into a
> separate package. In paticular, MatrixValue needs to be separated from
> Hadoop’s WritableComparable API.
> * The contents of the following packages need to move to the new library:
> sysml.runtime.functionobjects and sysml.runtime.matrix.operators
> * The library will need local input and output functions. I haven’t found
> suitable functions yet, but they may be hidden somewhere; in that case the
> existing functions should be adjacent to the other local linear algebra code.
> * Utility functions under classes in sysml.runtime.util will need to be
> replicated.
> * The more obscure subclasses of MatrixValue (MatrixCell, WeightedCell, etc.)
> do NOT need to be moved over.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)