Iseratho edited a comment on pull request #1169:
URL: https://github.com/apache/systemds/pull/1169#issuecomment-782887952


   ### Consideration when merging the PR
   
   When representing the tokens in long-format (i.e., a transformation that 
expands on rows (rows: n, maxTokens: m, idCols: k) -> (m*n, k+2)), I get the 
message in a follow-up `transformencode`: 
   > Job aborted due to stage failure: Task 0 in stage 10.0 failed 1 times, 
most recent failure: Lost task 0.0 in stage 10.0 (TID 18, localhost, executor 
driver): org.apache.sysds.runtime.DMLRuntimeException: Number of non-zeros 
mismatch on merge disjoint (target=1000x4, nnz target=4000, nnz source=3992)
   
   Unfortunately, I have not been able to fix this bug since it does not occur 
in the `tokenize` itself.
   However, I have since implemented a wide-format (i.e., a transformation that 
expands on columns (rows: n, maxTokens: m, idCols: k) -> (n, m+k)), where I 
could not reproduce the issue. The current state of the PR uses this format in 
the test cases and passes all checks. 
   
   I have commented out the test cases that do not work and they should be 
addressed in the future.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to