kykrueger commented on PR #1847: URL: https://github.com/apache/systemds/pull/1847#issuecomment-1596249203
@Baunsgaard I have a few questions about requirements now that I've gotten started with a proof-of-concept script. 1. I've selected the timeit module to help make the times more reproducible since it lets us elegantly ignore setup in the recorded time and disable some sources of additional processing time like the garbage-collector in python. However, I wasn't sure if this is something you want. Are you on-board with this approach, or did you want more of a big-picture benchmark with all overhead included? 2. I've noticed that the conversions from numpy and pandas data is transferred lazily to SystemDS in the JVM. So, calling `from_numpy` and `from_pandas`, alone isn't letting me evaluate the real overhead. If you choose a big-picture benchmark above, I figure here I'd be stuck adding some sort of simple operator and calling compute or get_lineage to force the data to load. Another option would be to trick systemds into loading the data by creating an empty `DMLScript` and calling the hidden internal `__prepare_script()` method because it has a bit less overhead than `get_lineage()`. This second option is what I'd prefer since it is more isolated, does it meet your needs? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@systemds.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org