christinadionysio commented on PR #1943:
URL: https://github.com/apache/systemds/pull/1943#issuecomment-1845913037

   After running the perftest I created two figures that provide evidence that 
the first (`dist`) and second (`dist_missing `) methods do not perform well on 
larger datasets. However, the third method (`dist_sample`) is working for 
larger datasets by decreasing the sample size. The first figure shows the 
runtime for each method for different dataset sizes. 
   As mentioned the first two methods do not perform well (java heap space 
exception) on larger datasets, which explains the missing values for `# rows 
100000 1000000 10000000`  
   
   
[knn_perf_runtime.pdf](https://github.com/apache/systemds/files/13603755/knn_perf_runtime.pdf)
   
   The second figure shows how the sampling size for the third method was 
decreased for larger datasets. 
   
   
[knn_perf_sampling_fac.pdf](https://github.com/apache/systemds/files/13603758/knn_perf_sampling_fac.pdf)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to