phaniarnab commented on PR #2036: URL: https://github.com/apache/systemds/pull/2036#issuecomment-2241614037
Thank you, @Mayaryin, @ingunnaf for your contribution. With some changes, I see a 2x speedup for real use cases, which is very good. I will not merge this PR immediately, as the changes are in the critical path of transformencode and may impact other running projects. However, after improving the robustness of this feature, I will merge it before the next release. List of TODOs include: - Integrate the lineage trace of the input frame into the key of the metadata cache, either just by adding the hash/checksum of the lineage trace or by making the build tasks lineage traceable. This extension will avoid incorrect reuse if the input frame is modified. - The number of bins may need to be added to the key to avoid incorrect reuse for different number of bins - Robustness of the hash function. Future work outside the scope of this PR: - Caching and reuse apply task results, which requires effective output allocation strategy. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@systemds.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org