On 12/31/2015 05:42 PM, Gilles wrote:
On Thu, 31 Dec 2015 12:54:00 -0600, Ole Ersoy wrote:
On 12/31/2015 11:10 AM, Gilles wrote:
On Wed, 30 Dec 2015 21:33:56 -0600, Ole Ersoy wrote:
Hi,
In RealMatrixFormat.parse() MatrixUtils makes the decision on what
type of RealMatrix instance to return.
Ideally, this is correct as the actual type is an "implementation detail".
Flexibility is gained if it
just returns double[][] letting the caller decide what type of
RealMatrix instance to create.
That could become a problem e.g. for sparse matrices where the persistent
format and the instance type could be optimized for space, but a "double[][]"
cannot be.
RealMatrixFormat.parse() first creates a double[][] and then it drops
it into the Matrix wrapper it thinks is best, per MatrixUtils. By
leaving out the last step the caller can either use MatrixUtils (Or
hopefully MatrixFactory) to perform the next step. Or maybe there is
no next step. Perhaps just having a double[][] is fine.
My opinion is that this code should be in a separate IO module.
where the external format can be made more flexible and more
correct (such as not doing unnecessary allocation).
Totally with you on that. Ideally something along the lines of MatrixPersist
and MatrixParse classes that support localized formatting. Right now it's all
bundled up into RealMatrixFormat...probably due to time constraints. I'll look
at modularizing that part later. Right I'm breaking up MatrixUtils into
MatrixFactory and LinearExceptionFactory, and then once the dust settles I can
look at the IO piece in more detail.
It's also better for modularity, as is
reduces RealMatrixFormat imports (The MatrixUtils supports Field
matrices as well, and I'm attempting to separate real and field
matrices into two difference modules).
For modularity, IO should not be in the same module as the core
algorithms.
I agree in general. I'm sticking all the 'Real' (Excluding Field)
classes in one module (Vector and Matrix). AbstractRealMatrix uses
RealMatrixFormat, so it's tightly coupled ATM and it seems like it
belongs with the real Vector and Matrix classes so...
Given the major refactoring which you are attempting, why not drop
everything that does not belong?
Good point. I'll just strip out the formatting, etc. from AbstractRealMatrix
and reintroduce it in the IO module.
Also just curious if Array2DRowRealMatrix is worth keeping? It seems
like the performance of BlockRealMatrix might be just as good or
better regardless of matrix size ... although my testing is limited.
I recall having performed a benchmark years ago and IIRC, the
"BlockRealMatrix" started to be more only for very large matrix size
(although I don't remember which).
That was what I was seeing as well. Once matrix rows reach 100K - 10
million performance goes up between 2X and 5X, but I did not really
see any difference for (multiplication only) in performance for small
data sets. So I'm assuming, like Luc indicated, that the
Array2DRowRealMatrix is only better when attempting to reuse the
underlying double[][] matrix a lot...
As I recall, for "small" matrices, the "Block" version was significantly
slower. Depends what we call "large" and "small"...
Hmm - That probably makes sense since Block has to create the block structure.
I'll have a second look once I get a good profiling setup added to the module.
HAPPY NEW YEAR!!
Ole
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org