[Hama Wiki] Update of "SpMV" by Mikalai Parafeniuk

Apache Wiki Sun, 12 Aug 2012 11:42:05 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hama Wiki" for change 
notification.


The "SpMV" page has been changed by Mikalai Parafeniuk:
http://wiki.apache.org/hama/SpMV?action=diff&rev1=9&rev2=10

  == Distributed Sparse Matrix-Vector Multiplication on Hama ==
  
  === Introduction ===
- In further description we will research problem in form u = Av. Most 
computational algoritms spends large percent of time for solving large systems 
of linear equations. In general, system of linear equations can be represented 
in matrix form Ax = b, where A is matrix with n rows and n columns, b - vector 
of size n, x - unknown solution vector which we are searching. Some approaches 
for solving linear systems has iterative nature. Assume, we know the initial 
approximation of x = x0. After that we represent our system in form xn = Bxn-1 
+ c, where c - vector of size n. After that we have to found next 
approximations of x till the convergence. In real world most of matrices 
contain relatively small number of non-zero items in comparison to total number 
of matrix items. Such matrices are called sparse matrices, matrices which 
filled most with non-zero items are called dense. Sparse matrices arise when 
each variable from the set is connected with small subset of variables (for 
example, differential equation of heat conduction). So, this page will describe 
the problem of sparse matrix vector multiplication(SpMV) with use of Bulk 
Synchronous Programming(BSP) model implemented in Apache Hama project. As shown 
above, SpMV can be used in different iterative solvers for system of linear 
equations.
+ In this article we will explore the problem of sparse matrix-vector 
multiplication, which can be written in form u = Av. Most computational 
algoritms spends large percent of time for solving large systems of linear 
equations. In general, system of linear equations can be represented in matrix 
form Ax = b, where A is matrix with n rows and n columns, b - vector of size n, 
x - unknown solution vector which we are searching. Some approaches for solving 
linear systems has iterative nature. Assume, we know the initial approximation 
of x = x0. After that we represent our system in form xn = Bxn-1 + c, where c - 
vector of size n. After that we can found next approximations of x and repeat 
this till the convergence. In real world most of matrices contain relatively 
small number of non-zero items in comparison to total number of matrix items. 
Such matrices are called sparse matrices, matrices which filled most with 
non-zero items are called dense. Sparse matrices arise when each variable from 
the set is connected with small subset of variables (for example, differential 
equation of heat conduction). So, this page will describe the problem of sparse 
matrix vector multiplication (SpMV) with use of Bulk Synchronous Programming 
(BSP) model implemented in Apache Hama project. As shown above, SpMV can be 
used in different iterative solvers for system of linear equations.
  Bulk Synchronous model proposes it's own smart way of parallelization of 
programs. We can specify input path for problem and number of peers. Framework 
reads the input and divides it between peers. Peers can be a processors, 
threads, separate machines, different items of cloud. BSP algorithm is divided 
in sequence of supersteps. Barrier synchronization of all peers is made after 
each superstep. The implementation of BSP(Apache Hama) contains primitives for 
defining peer number, communication with other peers with different 
communication primitives, optimizations of communication between peers, also it 
inherits most of features and approaches of Hadoop project.
  
  === Problem description ===
- As a sequential problem SpMV is almost trivial problem. But in case of 
parallel version we should think about some additional aspects:
+ As a sequential problem SpMV is almost trivial. But in case of parallel 
version we should think about some additional aspects:
-  1. Partitioning of matrix and vector components. This means that we should 
split the input matrix and vectors by peers, if we want to have benefits from 
usage of parallel algorithm. Wise partitioning should be taken or communication 
time will rise very much or we will get great load imbalance and algorithm will 
be inefficient.
+  1. Partitioning of matrix and vector components. This means, that we should 
split the input matrix and vectors by peers, if we want to have benefits from 
usage of parallel algorithm. Wise partitioning should be made or data locality 
won't be approached and communication time will rise very much or we will get 
great load imbalance and algorithm will be inefficient.
   2. Load balancing. This means that each peer must perform nearly the same 
amount of work, and none of them should idle.
   3. We must consider Hadoop and Hama approach for parallelization.
  
  === Implementation tips ===
-  1. Framework splits the input file to peers automatically. So we don't need 
to perform mapping of matrix to peers manually. We only must define how matrix 
can be written to file and how it can be readed from it. If we create matrix, 
which consists from separate cells, framework will give some subset of cells to 
each peer. If we create matrix consisting from rows, framework will give subset 
of rows to each peer. The ways to influence on partitioning: creating different 
writables for matrices as described above, overriding default partitioner class 
behavior.
+  1. Framework splits the input file to peers automatically. So we don't need 
to perform mapping of matrix to peers manually. We only must define how matrix 
can be written to file and how it can be readed from it. If we create matrix, 
which consists from separate cells, framework will give some subset of cells to 
each peer. If we create matrix consisting from rows, framework will give subset 
of rows to each peer. The ways to influence on partitioning: creating different 
writables for matrices, overriding default partitioner class behavior.
   2. We don't need to care about communication in case of row-wise matrix 
access. First of all, rows of matrix are splitted automatically by the 
framework. After that we can compute inner product of the vector and concrete 
matrix row, and the result can be directly printed to output, because it is one 
of the cells of result vector. In this case we assume, that peer's memory can 
fit two vectors. Even if we have million x million matrix and vector of size 
million, some megabytes will be enough to store them. Even if we split input 
vector the gain in memory will be insignificant.
  
  === Algorithm description ===
@@ -22, +22 @@

   1. Custom partitioning.
   2. Local computation.
   3. Output of result vector.
+  4. Constructing of dense vector.
- In setup stage every peer reads input dense vector from file. After that, 
framework will partition matrix rows by the algorithm provided in custom 
partitioner automatically. After that local computation is performed. We gain 
some cells of result vector, and they are written to output file.
+ In setup stage every peer reads input dense vector from file. After that, 
framework will partition matrix rows by the algorithm provided in custom 
partitioner automatically. After that local computation is performed. We gain 
some cells of result vector in bsp procedure, and they are written to output 
file. Output file is reread to construct instance of dense vector for further 
computation.
+ 
+ === Implementation ===
+ Implementation can be found in my GitHub repository 
[[https://github.com/ParafeniukMikalaj/spmv]] and patch can be found in 
[[https://issues.apache.org/jira/browse/HAMA-524|Apache JIRA]] as soon as JIRA 
will become available. GitHub repository contains only classes related to SpMV. 
I considered two possible use cases of SpMV:
+  1. Usage in pair with `RandomMatrixGenerator`.
+  2. Usage with arbitrary text files.
+ In this section you will see how to use SpMV in this two cases. NOTE: 
currently SpMV is buggy, so output can be not correct.
+ ==== Usage with RandomMatrixGenerator ====
+ `RandomMatrixGenerator` as a `SpMV` works with sequence file format. So, to 
multiply random matrix with random vector we will do the following: generate 
matrix and vector; convert matrix, vector and result to text file; view matrix, 
vector and result. This sequence is described by the following code snippet:
+ {{{
+ 1: bin/hama jar hama-examples-0.6.0-SNAPSHOT.jar rmgenerator 
/user/hduser/spmv/matrix-seq 4 4 0.3 4
+ 2: bin/hama jar hama-examples-0.6.0-SNAPSHOT.jar rmgenerator 
/user/hduser/spmv/vector-seq 1 4 0.8 1
+ 3: bin/hama jar hama-examples-0.6.0-SNAPSHOT.jar spmv 
/user/hduser/spmv/matrix-seq /user/hduser/spmv/vector-seq 
/user/hduser/spmv/result-seq 4
+ 4: ../hadoop/bin/hadoop dfs -rmr /user/hduser/spmv/result-seq/part
+ 5: bin/hama jar hama-examples-0.6.0-SNAPSHOT.jar matrixtotext 
/user/hduser/spmv/matrix-seq /user/hduser/spmv/matrix-txt
+ 6: bin/hama jar hama-examples-0.6.0-SNAPSHOT.jar matrixtotext 
/user/hduser/spmv/vector-seq /user/hduser/spmv/vector-txt
+ 7: bin/hama jar hama-examples-0.6.0-SNAPSHOT.jar matrixtotext 
/user/hduser/spmv/result-seq /user/hduser/spmv/result-txt
+ 8: ../hadoop/bin/hadoop dfs -cat /user/hduser/spmv/matrix-txt/*
+     0  4 1 2 0.09775514911904559
+     3  4 1 1 0.22718006778335464
+     1  4 1 1 0.796916052801057
+     3  4 1 2 0.9680719390036476
+     2  4 1 0 0.44269679525022
+ 9: ../hadoop/bin/hadoop dfs -cat /user/hduser/spmv/vector-txt/*
+     0  4 4 0 0.3111172930766064 1 0.27242674637140374 2 0.11394746764097863 3 
0.0
+ 10: ../hadoop/bin/hadoop dfs -cat /user/hduser/spmv/result-txt/*
+     0  5 5 0 0.011138951690981488 1 0.21710124739573375 2 0.1377306285919371 
3 0.11030934594375758 4 0.0
+ }}}
+ Line 1-2: Generation of input matrix and vector.<<BR>>
+ Line 3: SpMV algorithm.<<BR>>
+ Line 4: Deletion of part files from output directory at line 4. NOTE: 
`matrixtotext` will fail if this step will not be performed, because 
`result-seq` will containg part folder and `matrixtotext` don't know how to 
deal with it yet.<<BR>>
+ Line 5-7: Convertion of input matrix, input vector and result to text 
format.<<BR>>
+ Line 8-10: Showing the result. NOTE: currently SpMV is buggy - you can see, 
that output vector has length of 5, and the last cell of vector has incorrect 
value. This will be fixed soon.
+ 
+ ==== Usage with arbitrary text files ====
+ SpMV works with `SequenceFile`, so we need to provide tools to convert input 
and output of SpMV between sequence file format and text format. These tools 
are `matrixtoseq` and `matrixtotext`. This programs are included in example 
driver, so they can be launched like any other example. `matrixtoseq` converts 
matrix, represented in text file to sequence file format. Also this program 
gives choice to choose target writable: `DenseVectorWritable` and 
`SparseVectorWritable`.
+ {{{
+ Usage: matrixtoseq <input matrix dir> <output matrix dir> <dense|sparse> 
[number of tasks (default max)]
+ }}}
+ `matrixtotext` converts matrix from sequence file format to text file.
+ {{{
+ Usage: matrixtotext <input matrix dir> <output matrix dir> [number of tasks 
(default max)]
+ }}}
+ To use SpMV in this mode you should provide text files in appropriate format. 
I decided to represent all matrices and vectors as follows: each row of the 
matrix is represented by row index, length of the row, number of non-zero 
items, pairs of index and value. All values inside rows are separated by 
whitespace, rows are separated by newline. Vectors are represented as matrix 
rows with arbitrary row index(not used). So, for example:
+ {{{
+ [1 0 2]    3 2 0 1 2 2
+ [0 0 0]  = 3 0
+ [0 5 1]    3 2 1 5 2 1
+ }}}
+ Now let's show some example. Imagine that you need to multiply
+ {{{
+  [1 0 6 0]   [2]   [38] 
+  [0 4 0 0] * [3] = [12] 
+  [0 2 3 0]   [6]   [24] 
+  [3 0 0 5]   [0]   [6]
+ }}}
+ First of all, you should create appropriate text files for input matrix and 
input vector. For input matrix file should look like
+ {{{
+ 0 4 2 0 1 2 6
+ 1 4 1 1 4
+ 2 4 2 1 2 2 3
+ 3 4 2 0 3 3 5
+ }}}
+ For vector file should be look like
+ {{{
+ 0 4 3 0 2 1 3 2 6
+ }}}
+ After that you should copy these files to HDFS. If you don't feel comfortable 
with HDFS please see 
[[http://hadoop.apache.org/common/docs/r0.20.0/hdfs_shell.html|this tutorial]]. 
I propose the following directory structure for this example
+ {{{
+ /user/hduser/spmv/matrix-seq
+ /user/hduser/spmv/matrix-txt
+ /user/hduser/spmv/result-seq
+ /user/hduser/spmv/result-txt
+ /user/hduser/spmv/vector-seq
+ /user/hduser/spmv/vector-txt
+ }}}
+ Suffix `seq` denotes that directory contains sequence files. Suffix `txt` 
denotes that directory contains human-readable text files in format, described 
above. After you have copied input matrix into `matrix-txt` and input vector 
into `vector-txt`, we are ready to start. The following code snippet shows, how 
you can multiply matrices in this mode. Explanations will be given below.
+ {{{
+ 1: jar hama-examples-0.6.0-SNAPSHOT.jar matrixtoseq 
/user/hduser/spmv/matrix-txt /user/hduser/spmv/matrix-seq sparse 4
+ 2: bin/hama jar hama-examples-0.6.0-SNAPSHOT.jar matrixtoseq 
/user/hduser/spmv/vector-txt /user/hduser/spmv/vector-seq dense 4
+ 3: bin/hama jar hama-examples-0.6.0-SNAPSHOT.jar spmv 
/user/hduser/spmv/matrix-seq /user/hduser/spmv/vector-seq 
/user/hduser/spmv/result-seq 4
+ 4: ../hadoop/bin/hadoop dfs -rmr /user/hduser/spmv/result-seq/part
+ 5: bin/hama jar hama-examples-0.6.0-SNAPSHOT.jar matrixtotext 
/user/hduser/spmv/result-seq /user/hduser/spmv/result-txt
+ 6: ../hadoop/bin/hadoop dfs -cat /user/hduser/spmv/result-txt/part-00000
+     0  4 4 0 38.0 1 12.0 2 24.0 3 6.0
+ }}}
+ Line 1: Converting input matrix to sequence file format, internally 
consisting of `SparseVectorWritable`.<<BR>>
+ Line 2: Converting input vector to sequence file format, internally 
consisting of `DenseVectorWritable`.<<BR>>
+ Line 3: SpMV algorithm.<<BR>>
+ Line 4: We delete part files from output directory. NOTE: `matrixtotext` will 
fail if this step will not be performed, because result-seq will containg part 
folder and matrixtotext don't know how to deal with it yet.<<BR>>
+ Line 5: Convertion of result vector to text format.<<BR>>
+ Line 6: Output of result vector. You can see that we gained an expected 
vector.<<BR>>
+ 
  
  === Possible improvements ===
+  1. Bug fixing. My main aim now - provide stable work of SpMV.
-  1. Significant improvement in total time of algorithm can be achieved by 
creating custom partitioner class. It will give us load balancing and therefore 
better efficiency. This is the main possibility for optimization, because we 
decided, that using of row-wise matrix access i acceptable. Maybe it can be 
achieved by reordering of input or by customizing partitioning algorithm of 
framework.
+  2. Significant improvement in total time of algorithm can be achieved by 
creating custom partitioner class. It will give us load balancing and therefore 
better efficiency. This is the main possibility for optimization, because we 
decided, that using of row-wise matrix access i acceptable. Maybe it can be 
achieved by reordering of input or by customizing partitioning algorithm of 
framework.
  
  === Literature ===
   1. Rob H. Bisseling - Parallel Scientific computation. (chapter 4).

[Hama Wiki] Update of "SpMV" by Mikalai Parafeniuk

Reply via email to