[ 
https://issues.apache.org/jira/browse/MATH-230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655953#action_12655953
 ] 

Ismael Juma commented on MATH-230:
----------------------------------

Yes, we still need an open addressed map if we want to have good space 
characteristics. A comparison of how much space each entry takes for each case 
follows (note that I ignore the things that take constant space independent of 
the number of entries like size and such):

For HashMap<Integer,Double>, assuming a 32-bit JVM (would be much worse for the 
64-bit one) and that the map has reached the maximum load factor (in bytes):

- Map.Entry object = 4 (key reference) + 4 (value reference) + 4 (next 
reference) + 4 (cached int hash) + 8 (JVM mark and klass)[1] = 24 bytes
- Integer object = 16 bytes[1]
- Double object = 16 bytes (the reason why Integer and Double objects take the 
same space is due to alignment padding in Integer)
- Space in Map.Entry[] = 4 bytes / 0.75 = 5.33 bytes
- Total = 41.33 bytes

For OpentIntToDoubleHashMap, assuming a 32-bit JVM (would be similar in 64-bit 
JVM since it doesn't rely on pointers in the non constant variables) that the 
map has reached the maximum load factor (in bytes):

- Space in byte[] = 1 byte / 0.50 (load factor) = 2 bytes
- Space in double[] = 8 bytes / 0.50 (load factor) = 16 bytes
- Space in int[] = 4 bytes / 0.50 (load factor) = 8 bytes
- Total = 26 bytes

So, HashMap takes more than 50% more space for a 32-bit JVM and quite a bit 
more for the 64-bit one (since it's pointer heavy and alignment issues make it 
worse) and it's harder on the GC because all those Double, Integer and 
Map.Entry objects must be collected. Furthermore, the additional space doesn't 
even necessarily help with cache locality since Map.Entry only holds references 
to the actual objects (unfortunately, until we get value types, a.k.a. structs, 
we can't implement the ideal HashMap).

[1] http://blogs.sun.com/jrose/entry/fixnums_in_the_vm

> Implement Sparse Matrix Support
> -------------------------------
>
>                 Key: MATH-230
>                 URL: https://issues.apache.org/jira/browse/MATH-230
>             Project: Commons Math
>          Issue Type: Improvement
>    Affects Versions: 2.0
>         Environment: N/A
>            Reporter: Sujit Pal
>            Assignee: Luc Maisonobe
>            Priority: Minor
>             Fix For: 2.0
>
>         Attachments: math-230.diff, patch.txt, 
> RealMatrixImplPerformanceTest.java, SparseRealMatrixImpl.java, 
> SparseRealMatrixImplTest.java
>
>
> I needed a way to deal with large sparse matrices using commons-math 
> RealMatrix, so I implemented it. The SparseRealMatrixImpl is a subclass of 
> RealMatrixImpl, and the backing data structure is a Map<Point,Double>, where 
> Point is a struct like inner-class which exposes two int parameters row and 
> column. I had to make some changes to the existing components to keep the 
> code for SparseRealMatrixImpl clean. Here are the details.
> 1) RealMatrix.java:
>    - added a new method setEntry(int, int, double) to set data into a matrix
> 2) RealMatrixImpl.java:
>    - changed all internal calls to data[i][j] to getEntry(i,j).
>    - for some methods such as add(), subtract(), premultiply(), etc, there
>      was code that checked for ClassCastException and had two versions,
>      one for a generic RealMatrix and one for a RealMatrixImpl. This has
>      been changed to have only one that operates on a RealMatrix. The 
>      result is something like auto-type casting. So if:
>        RealMatrixImpl.add(RealMatrix) returns a RealMatrixImpl
>        SparseRealMatrixImpl.add(RealMatrix) returns a SparseRealMatrixImpl
> 3) SparseRealMatrixImpl added as a subclass of RealMatrixImpl.
> 4) LUDecompositionImpl changed to use a clone of the passed in RealMatrix
>    instead of its data[][] block, and now it uses clone.getEntry(row,col)
>    calls instead of data[row][col] calls.
> 5) LUDecompositionImpl returned RealMatrixImpl for getL(), getU(), getP()
>    and solve(). It now returns the same RealMatrix impl that is passed 
>    in through its constructor for these methods.
> 6) New test for SparseRealMatrixImpl, mimics the tests in RealMatrixImplTest,
> 7) New static method to create SparseRealMatrixImpl out of a double[][] in
>    MatrixUtils.createSparseRealMatrix().
>    but using SparseRealMatrixImpl.
> 8) Verified that all JUnit tests pass.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to