Jaybit0 opened a new pull request, #2062: URL: https://github.com/apache/systemds/pull/2062
This PR is an extension to PR #2045 and implements support for data-dependent n-grams using and extending the existing lineage functionality. As we are dealing with DAGs which are not linear sequences of instructions, I implemented the extension in such a way that it tracks every instruction path of the length `n`. If we had for example the DAG `(a*b + c/d)` and wanted to record bigrams, the two operation sequences `[(*, +), (/, +)]` would be added to the bigram store. I also keep track of the individual data-types of each instruction which is why I extended the existing lineage functionality as the `_data` string is sometimes empty and contains inconsistent information. The n-gram table now looks like this, where the arguments within brackets show the input parameters of an instruction (separated by `°`) and the suffix `[i]` represents the parameter index for the following instruction (e.g. for the first entry the result of `rblk` is used as the second paremeter for `ba+*`): ``` Most common 2-grams (sorted by absolute time): # N-Gram Time(s) StdDev(t)/Mean(t) Count 1 (rblk·MATRIX·FP64(MATRIX·FP64) 1,144 (, 1.067) 4 [1], ba+*·MATRIX·FP64(MATRIX·F P64 ° MATRIX·FP64)) 2 (rblk·MATRIX·FP64(MATRIX·FP64) 0,853 (, 0.469) 3 [0], ba+*·MATRIX·FP64(MATRIX·F P64 ° MATRIX·FP64)) 3 (createvar·MATRIX·FP64()[0], r 0,343 (0.627, 0.929) 2 blk·MATRIX·FP64(MATRIX·FP64)) 4 (rblk·MATRIX·FP64(MATRIX·FP64) 0,285 - 1 [0], cpvar·MATRIX·FP64(MATRIX· FP64)) 5 (+*·MATRIX·FP64(MATRIX·FP64 ° 0,153 - 1 SCALAR·FP64 ° MATRIX·FP64)[0], write·MATRIX·FP64(MATRIX·FP64 ° L_SCALAR·STRING ° L_SCALAR· STRING ° L_SCALAR·INT64)) ``` @mboehm7 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@systemds.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org