Baunsgaard commented on PR #1847: URL: https://github.com/apache/systemds/pull/1847#issuecomment-1607319364
> Since I generated the data with systemds, it comes with a schema. When it comes to the interpolation, I had meant to say I didn't specify dtypes for numpy and pandas, but they had been smart enough to recognize the values as float64. > In order to get benchmarks for the other data-types, I could pass in a dtype optionally to my scripts which forces numpy and pandas to read put the values into the type we want. I'll also check out overriding the types provided in the MTD files in the read, command for systemds. I think I remember seeing something in your docs which make that possible. This should make it easy to test for all of the data types you listed, without the creation of additional scripts, how about it? The issue lies in matrices primarily, again currently the datatype is always double inside systemds. and transferring this out to numpy is always the same cost. What i would be interested to know is still the transfer in if that is efficient. For pandas and frames we need the different types, and yes these can be specified in the mtd file. as long as the content of the CSV is correct it should work. > I'm not sure about why you would like to generate the data in the scripts, that would leave us with writing the same generation code multiple times. I'd prefer to keep using the generation format similar to the existing perftest scripts. However, I don't think the large data-sizes are very useful. Since the 80MB dataset stalls pandas for much too long, I'm going to add a smaller generation option to the list and set it as the default. Feel free to reduce the data size further. as said previously it is know that out pandas implementation is slow. we need to fix it after this task. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@systemds.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org