[
https://issues.apache.org/jira/browse/IGNITE-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16280426#comment-16280426
]
Oleg Ignatenko edited comment on IGNITE-7097 at 12/8/17 5:23 PM:
-----------------------------------------------------------------
I experimented a bit with running benchmarks using ML codebase in master: the
issue is still reproducible. Whatever causes it, apparently hasn't been fixed
yet. Plan to check if maybe there is some kind of leak, maybe intermediate
objects are created by {{likeMatrix}} or {{likeVector}} (possibly eg through
{{copy}}) somewhere within multiplication without corresponding {{destroy}}
because in past experiments it was missing destroy that caused yardstick to
hang.
-----
*Update:* upon closer investigation the issue looks more complicated than I
initially thought and hypothesys of a simple leak doesn't seem to explain what
I observed in experimenting with this benchmark. Specifically, benchmark seems
to run smoothly with "SIZE" parameter 4x smaller than that of other matrices
(that is with 16\(!) less elements in test matrices).
But when I increased that SIZE param by 2 it seems to run unacceptably long.
Specifically, when three hosts were emulated on a single machine it run for 2
hours and when these were 3 real hosts, it run for over half hour. This is way
too long and additionally, generated charts didn't look OK.
Given above my plan is as follows. After IGNITE-6123 is merged I will re-test
the benchmark with smaller size that was proven to run well and if it stays OK
will tune it for that size - in order to let us have at least _some_ way to
reliably test for possible regressions. As for larger size I plan to open a
separate task to profile and investigate what's going on there because what I
observed looks quite worrying.
was (Author: oignatenko):
I experimented a bit with running benchmarks using ML codebase in master: the
issue is still reproducible. Whatever causes it, apparently hasn't been fixed
yet. Plan to check if maybe there is some kind of leak, maybe intermediate
objects are created by {{likeMatrix}} or {{likeVector}} (possibly eg through
{{copy}}) somewhere within multiplication without corresponding {{destroy}}
because in past experiments it was missing destroy that caused yardstick to
hang.
-----
*Update:* upon closer investigation the issue looks more complicated than I
initially thought and hypothesys of a simple leak doesn't seem to explain what
I observed in experimenting with this benchmark. Specifically, benchmark seems
to run smoothly with "SIZE" parameter 4x smaller than that of other matrices
(that is with 16\(!) less elements in test matrices). But when SIZE increases
by 2 it seems to run unacceptably long (possibly hang, hard to tell).
Given above my plan is as follows. After IGNITE-6123 is merged I will re-test
the benchmark with smaller size that was proven to run well and if it stays OK
will tune it for that size - in order to let us have at least _some_ way to
reliably test for possible regressions. As for larger size I plan to open a
separate task to profile and investigate what's going on there because what I
observed looks quite worrying.
> performance measurement for SparseDistributedMatrix multiplication
> ------------------------------------------------------------------
>
> Key: IGNITE-7097
> URL: https://issues.apache.org/jira/browse/IGNITE-7097
> Project: Ignite
> Issue Type: Task
> Components: ml, yardstick
> Reporter: Oleg Ignatenko
> Assignee: Oleg Ignatenko
> Fix For: 2.4
>
>
> We want to start tracking our performance to avoid performance degradation.
> Also we need some performance comparison with other ml libs.
> Initial draft for this benchmark was made per IGNITE-6123 (class
> {{IgniteSparseDistributedMatrixMulBenchmark}}) but it currently hangs so it
> is excluded. Find a way to do it right.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)