philipportner opened a new pull request, #2288: URL: https://github.com/apache/systemds/pull/2288
Adds a test case with inputs that have multiple groups with varying row counts. This pattern comes from a `lineorder.csv` example dataset that currently causes a runtime exception for the `permutation-matrix` approach but works for the `nested-loop` approach. Why this happened: - `permutation-matrix` approach allocated space assuming every group has `maxRowsInGroup` rows - groups may have variable sizes resulting in `Y_temp_reduce` having fewer rows than the reshape expects Changes: - correctly pads the matrix in when groups do not all have `maxRowsInGroup` rows - adds testcases that cover this pattern To reproduce original crash: `lineorder.csv`: ``` 0,1,2,3,4 1.0,1.0,18238.0,155190.0,828.0 1.0,2.0,18238.0,67310.0,163.0 1.0,3.0,18238.0,63700.0,71.0 1.0,4.0,18238.0,2132.0,943.0 2.0,1.0,20612.0,106170.0,1066.0 2.0,2.0,20612.0,194509.0,602.0 2.0,3.0,20612.0,100164.0,138.0 2.0,4.0,20612.0,45803.0,1382.0 2.0,5.0,20612.0,4439.0,1684.0 3.0,1.0,13813.0,4297.0,1959.0 ``` `crash.dml`: ``` path_to_lineorder = "lineorder.csv" X = read(path_to_lineorder, format = "csv", header=TRUE, sep = ",") source("./scripts/builtin/raGroupby.dml") as ra_new Y = ra_new::m_raGroupby(X, 2, "nested-loop") print(toString(Y)) # nested-loop works Y = ra_new::m_raGroupby(X, 2, "permutation-matrix") print(toString(Y)) # permutation-matrix breaks ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@systemds.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org