[ 
https://issues.apache.org/jira/browse/IGNITE-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16280426#comment-16280426
 ] 

Oleg Ignatenko edited comment on IGNITE-7097 at 12/8/17 5:23 PM:
-----------------------------------------------------------------

I experimented a bit with running benchmarks using ML codebase in master: the 
issue is still reproducible. Whatever causes it, apparently hasn't been fixed 
yet. Plan to check if maybe there is some kind of leak, maybe intermediate 
objects are created by {{likeMatrix}} or {{likeVector}} (possibly eg through 
{{copy}}) somewhere within multiplication without corresponding {{destroy}} 
because in past experiments it was missing destroy that caused yardstick to 
hang.

-----

*Update:* upon closer investigation the issue looks more complicated than I 
initially thought and hypothesys of a simple leak doesn't seem to explain what 
I observed in experimenting with this benchmark. Specifically, benchmark seems 
to run smoothly with "SIZE" parameter 4x smaller than that of other matrices 
(that is with 16\(!) less elements in test matrices).

But when I increased that SIZE param by 2 it seems to run unacceptably long. 
Specifically, when three hosts were emulated on a single machine it run for 2 
hours and when these were 3 real hosts, it run for over half hour. This is way 
too long and additionally, generated charts didn't look OK.

Given above my plan is as follows. After IGNITE-6123 is merged I will re-test 
the benchmark with smaller size that was proven to run well and if it stays OK 
will tune it for that size - in order to let us have at least _some_ way to 
reliably test for possible regressions. As for larger size I plan to open a 
separate task to profile and investigate what's going on there because what I 
observed looks quite worrying.


was (Author: oignatenko):
I experimented a bit with running benchmarks using ML codebase in master: the 
issue is still reproducible. Whatever causes it, apparently hasn't been fixed 
yet. Plan to check if maybe there is some kind of leak, maybe intermediate 
objects are created by {{likeMatrix}} or {{likeVector}} (possibly eg through 
{{copy}}) somewhere within multiplication without corresponding {{destroy}} 
because in past experiments it was missing destroy that caused yardstick to 
hang.

-----

*Update:* upon closer investigation the issue looks more complicated than I 
initially thought and hypothesys of a simple leak doesn't seem to explain what 
I observed in experimenting with this benchmark. Specifically, benchmark seems 
to run smoothly with "SIZE" parameter 4x smaller than that of other matrices 
(that is with 16\(!) less elements in test matrices). But when SIZE increases 
by 2 it seems to run unacceptably long (possibly hang, hard to tell).

Given above my plan is as follows. After IGNITE-6123 is merged I will re-test 
the benchmark with smaller size that was proven to run well and if it stays OK 
will tune it for that size - in order to let us have at least _some_ way to 
reliably test for possible regressions. As for larger size I plan to open a 
separate task to profile and investigate what's going on there because what I 
observed looks quite worrying.

> performance measurement for SparseDistributedMatrix multiplication
> ------------------------------------------------------------------
>
>                 Key: IGNITE-7097
>                 URL: https://issues.apache.org/jira/browse/IGNITE-7097
>             Project: Ignite
>          Issue Type: Task
>          Components: ml, yardstick
>            Reporter: Oleg Ignatenko
>            Assignee: Oleg Ignatenko
>             Fix For: 2.4
>
>
> We want to start tracking our performance to avoid performance degradation. 
> Also we need some performance comparison with other ml libs.
> Initial draft for this benchmark was made per IGNITE-6123 (class 
> {{IgniteSparseDistributedMatrixMulBenchmark}}) but it currently hangs so it 
> is excluded. Find a way to do it right.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to