WDRshadow commented on PR #2050:
URL: https://github.com/apache/systemds/pull/2050#issuecomment-2241077556

   > @WDRshadow, thanks for putting the numbers here. Did you take an average 
of 3 runs to capture the execution time? If not, please do that to avoid the 
JIT compilation and GC overheads. And I assume the numbers reported in this 
table only measure the total inference time and not the training time.
   > 
   > The speedup from 2 GPUs is way less than I expected. Can you explain, why 
the speedup is not consistently 2x? If you are scoring n images, then each GPU 
gets n/2 images, which should lead to 2x speedup. I do not anticipate any 
additional overhead for two GPUs for this use case.
   
   Thanks. Your assumptions are inaccurate. This time is the total execution 
time, which includes a exactly the same training process before the execution 
of the `parfor`loop. This is one reason. I am not familiar with `.dml` files 
and have no time to learn it, so I don't know how to store and read a trained 
model. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@systemds.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to