Ismael Juma (JIRA) wrote:
[ https://issues.apache.org/jira/browse/MATH-230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12653805#action_12653805 ]
Ismael Juma commented on MATH-230:
----------------------------------
Or we could just add the appropriate specialised Map to commons-math. Are there
objections to that? I could provide one, it's relatively easy to do. I think
the best approach might even be one not mentioned above. Use a Map.Entry object
with two ints and a double. This has a few advantages:
(1) We use chaining instead of open addressing, so we can use a higher load
factor with good performance (e.g. 0.75 instead of 0.50). This should mitigate
the additional memory used by the Map.Entry object.
(2) We use a single array of Map.Entry objects instead of two arrays (one for
keys and one for values). When testing the performance for Scala HashMap
implementations (which converts to bytecode very similar to the one generated
by Java), it seems that using two arrays performs quite a bit worse (although
the test was using Strings instead of primitives for the value)[1].
(3) Gets don't require any boxing if we provide a get(int, int) method (this is
the same as using open addressing with primitives only though).
A rough sample of the implementation outline:
class IntPairToDoubleHashMap {
static class Entry {
final int firstKey;
final int secondKey;
double value;
}
int size;
Entry[] entries;
double get(int firstKey, int secondKey)
double put(int firstKey, int secondKey, double value)
}
This would be an internal map for usage by SparseRealMatrixImpl. I think
HotSpot would be able to inline calls to it, but we could also just have the
relevant code in SparseMatrixImpl. I think it's important to provide an
implementation that has the right performance characteristics by default.
What do you think?
[1] http://www.nabble.com/-scala--Open-versus-Chained-HashMap-td19254845.html
Implement Sparse Matrix Support
-------------------------------
Key: MATH-230
URL: https://issues.apache.org/jira/browse/MATH-230
Project: Commons Math
Issue Type: Improvement
Affects Versions: 2.0
Environment: N/A
Reporter: Sujit Pal
Assignee: Luc Maisonobe
Priority: Minor
Fix For: 2.0
Attachments: patch.txt, RealMatrixImplPerformanceTest.java,
SparseRealMatrixImpl.java, SparseRealMatrixImplTest.java
I needed a way to deal with large sparse matrices using commons-math RealMatrix, so I
implemented it. The SparseRealMatrixImpl is a subclass of RealMatrixImpl, and the
backing data structure is a Map<Point,Double>, where Point is a struct like
inner-class which exposes two int parameters row and column. I had to make some
changes to the existing components to keep the code for SparseRealMatrixImpl clean.
Here are the details.
1) RealMatrix.java:
- added a new method setEntry(int, int, double) to set data into a matrix
2) RealMatrixImpl.java:
- changed all internal calls to data[i][j] to getEntry(i,j).
- for some methods such as add(), subtract(), premultiply(), etc, there
was code that checked for ClassCastException and had two versions,
one for a generic RealMatrix and one for a RealMatrixImpl. This has
been changed to have only one that operates on a RealMatrix. The
result is something like auto-type casting. So if:
RealMatrixImpl.add(RealMatrix) returns a RealMatrixImpl
SparseRealMatrixImpl.add(RealMatrix) returns a SparseRealMatrixImpl
3) SparseRealMatrixImpl added as a subclass of RealMatrixImpl.
4) LUDecompositionImpl changed to use a clone of the passed in RealMatrix
instead of its data[][] block, and now it uses clone.getEntry(row,col)
calls instead of data[row][col] calls.
5) LUDecompositionImpl returned RealMatrixImpl for getL(), getU(), getP()
and solve(). It now returns the same RealMatrix impl that is passed
in through its constructor for these methods.
6) New test for SparseRealMatrixImpl, mimics the tests in RealMatrixImplTest,
7) New static method to create SparseRealMatrixImpl out of a double[][] in
MatrixUtils.createSparseRealMatrix().
but using SparseRealMatrixImpl.
8) Verified that all JUnit tests pass.
*****************************************************************
This E-mail is confidential and intended only for the recipients to whom it is
addressed.
If you were not an intended recipient, please notify the sender and delete all
copies.
S4Carlisle does not accept liability for electronic file transfers.
*****************************************************************