philipportner opened a new pull request, #2288:
URL: https://github.com/apache/systemds/pull/2288

   Adds a test case with inputs that have multiple groups with varying row 
counts.
   
   This pattern comes from a `lineorder.csv` example dataset that currently 
causes a runtime exception for the `permutation-matrix` approach but works for 
the `nested-loop` approach.
   
   Why this happened:
   - `permutation-matrix` approach allocated space assuming every group has 
`maxRowsInGroup` rows
   - groups may have variable sizes resulting in `Y_temp_reduce` having fewer 
rows than the reshape expects
   
   Changes:
   - correctly pads the matrix in when groups do not all have `maxRowsInGroup` 
rows
   - adds testcases that cover this pattern
   
   To reproduce original crash:
   
   `lineorder.csv`:
   ```
   0,1,2,3,4
   1.0,1.0,18238.0,155190.0,828.0
   1.0,2.0,18238.0,67310.0,163.0
   1.0,3.0,18238.0,63700.0,71.0
   1.0,4.0,18238.0,2132.0,943.0
   2.0,1.0,20612.0,106170.0,1066.0
   2.0,2.0,20612.0,194509.0,602.0
   2.0,3.0,20612.0,100164.0,138.0
   2.0,4.0,20612.0,45803.0,1382.0
   2.0,5.0,20612.0,4439.0,1684.0
   3.0,1.0,13813.0,4297.0,1959.0
   ```
   
   `crash.dml`:
   ```
   path_to_lineorder = "lineorder.csv"
   X  = read(path_to_lineorder, format = "csv", header=TRUE, sep = ",")
   source("./scripts/builtin/raGroupby.dml") as ra_new
   Y = ra_new::m_raGroupby(X, 2, "nested-loop")
   print(toString(Y)) # nested-loop works
   Y = ra_new::m_raGroupby(X, 2, "permutation-matrix")
   print(toString(Y)) # permutation-matrix breaks
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@systemds.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to