Baunsgaard commented on code in PR #2406:
URL: https://github.com/apache/systemds/pull/2406#discussion_r2710488477


##########
src/main/java/org/apache/sysds/runtime/data/SparseBlockCSR.java:
##########
@@ -937,12 +938,12 @@ public boolean checkValidity(int rlen, int clen, long 
nnz, boolean strict) {
                }
 
                //2. correct array lengths
-               if(_size != nnz && _ptr.length < rlen+1 && _values.length < nnz 
&& _indexes.length < nnz ) {
+               if( _size != nnz || _ptr.length < rlen+1 || _values.length < 
nnz || _indexes.length < nnz ) {

Review Comment:
   same here, try the reuse.



##########
src/main/java/org/apache/sysds/runtime/data/SparseBlockCSC.java:
##########
@@ -581,11 +579,13 @@ public boolean checkValidity(int rlen, int clen, long 
nnz, boolean strict) {
                                throw new RuntimeException(
                                        "The values array should not contain 
zeros." + " The " + i + "th value is " + _values[i]);
                        }
+                       if(_indexes[i] < 0)
+                               throw new RuntimeException("Invalid index at 
pos=" + i);
                }
 
                //6. a capacity that is no larger than nnz times resize factor.
                int capacity = _values.length;
-               if(capacity > nnz * RESIZE_FACTOR1) {
+               if(capacity > INIT_CAPACITY && capacity > nnz * RESIZE_FACTOR1) 
{

Review Comment:
   The additional check seems redundant. Maybe I am missing something.
   
   I seem to remember misusing the INIT_CAPACITY somewhere else, and explicitly 
setting it high to avoid reallocation on appends to MCSR. 



##########
src/main/java/org/apache/sysds/runtime/data/SparseBlockCSC.java:
##########
@@ -550,12 +549,12 @@ public boolean checkValidity(int rlen, int clen, long 
nnz, boolean strict) {
                }
 
                //2. correct array lengths
-               if(_size != nnz && _ptr.length < clen + 1 && _values.length < 
nnz && _indexes.length < nnz) {
+               if(_size != nnz || _ptr.length < clen + 1 || _values.length < 
nnz || _indexes.length < nnz) {

Review Comment:
   Here is what i mentioned as duplicated, maybe reuse.



##########
src/main/java/org/apache/sysds/runtime/data/SparseBlock.java:
##########
@@ -501,16 +508,20 @@ public boolean contains(double pattern, int rl, int ru) {
        }
        
        public List<Integer> contains(double[] pattern, boolean earlyAbort) {
+               int nnz = UtilFunctions.computeNnz(pattern, 0, pattern.length);
                List<Integer> ret = new ArrayList<>();
                int rlen = numRows();
+
                for( int i=0; i<rlen; i++ ) {
                        int apos = pos(i);
                        int alen = size(i);
+                       if(nnz != alen) continue;

Review Comment:
   Not completely guaranteed if the underlying block is not compacted.
   
   Additionally, while I really like the optimization to leverage the nnz in 
the comparing vector, the comparing vector also should have the nonzeros at the 
correct indexes. Perhaps an implementation leveraging materializing a temporary 
int[] with such offsets, would be faster?



##########
src/main/java/org/apache/sysds/runtime/data/SparseBlockFactory.java:
##########
@@ -84,7 +86,10 @@ public static boolean isSparseBlockType(SparseBlock sblock, 
SparseBlock.Type typ
        public static SparseBlock.Type getSparseBlockType(SparseBlock sblock) {
                return (sblock instanceof SparseBlockMCSR) ? 
SparseBlock.Type.MCSR :

Review Comment:
   perhaps instead, add a method all of the SparseBlocks need to implement, and 
have them return their type. This avoids this unmaintainable dependency.



##########
src/main/java/org/apache/sysds/runtime/data/SparseBlockCOO.java:
##########


Review Comment:
   I am a bit horrified about how many bugs you found in this file :) 



##########
src/main/java/org/apache/sysds/runtime/data/SparseBlockCOO.java:
##########
@@ -221,12 +235,12 @@ public boolean checkValidity(int rlen, int clen, long 
nnz, boolean strict) {
                }
 
                //2. correct array lengths
-               if(_size != nnz && _cindexes.length < nnz && _rindexes.length < 
nnz && _values.length < nnz) {
+               if(_size != nnz || _cindexes.length < nnz || _rindexes.length < 
nnz || _values.length < nnz) {

Review Comment:
   This line is duplicated across the sparse blocks, maybe some of these 
validity checks can be unified?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to