Baunsgaard commented on PR #1847:
URL: https://github.com/apache/systemds/pull/1847#issuecomment-1607319364

   > Since I generated the data with systemds, it comes with a schema. When it 
comes to the interpolation, I had meant to say I didn't specify dtypes for 
numpy and pandas, but they had been smart enough to recognize the values as 
float64.
   
   
   
   > In order to get benchmarks for the other data-types, I could pass in a 
dtype optionally to my scripts which forces numpy and pandas to read put the 
values into the type we want. I'll also check out overriding the types provided 
in the MTD files in the read, command for systemds. I think I remember seeing 
something in your docs which make that possible. This should make it easy to 
test for all of the data types you listed, without the creation of additional 
scripts, how about it?
   
   The issue lies in matrices primarily, again currently the datatype is always 
double inside systemds. and transferring this out to numpy is always the same 
cost. What i would be interested to know is still the transfer in if that is 
efficient.
   
   For pandas and frames we need the different types, and yes these can be 
specified in the mtd file. as long as the content of the CSV is correct it 
should work.
   
   > I'm not sure about why you would like to generate the data in the scripts, 
that would leave us with writing the same generation code multiple times. I'd 
prefer to keep using the generation format similar to the existing perftest 
scripts. However, I don't think the large data-sizes are very useful. Since the 
80MB dataset stalls pandas for much too long, I'm going to add a smaller 
generation option to the list and set it as the default.
   
   Feel free to reduce the data size further. as said previously it is know 
that out pandas implementation is slow. we need to fix it after this task.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@systemds.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to