[
https://issues.apache.org/jira/browse/IGNITE-5535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16200226#comment-16200226
]
Oleg Ignatenko commented on IGNITE-5535:
----------------------------------------
I implemented trial code for off-heap BLAS on linux-x86_64 in [branch
ignite-5535-1|https://github.com/gridgain/apache-ignite/tree/ignite-5535-1]
(also attached as a patch here:
[^IGNITE-5535.BLAS_support_for_offheap_vector_matrix.zip]).
This code was benchmarked against on-heap BLAS working with netlib. Benchmarks
show that performance is about the same. Based on that unexpected outcome I
took a closer look at netlib and discovered that its implementation uses JNI in
a way that doesn't require copying data from Java heap (method
[GetPrimitiveArrayCritical|https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/functions.html#GetPrimitiveArrayCritical_ReleasePrimitiveArrayCritical]).
This means that our implementation won't speedup on-heap BLAS handled by
netlib. This in turn limits expected benefit solely to off-heap data processing
and more specifically, to cases when copying data to heap (for its further
processing with netlib) would be not feasible for some reasons.
-----
In order to account for the change in our expectations (which initially assumed
unconditional improvement) I closer inspected expected implementation efforts:\\
\\
- Coding. I expect we need to write 10-15K code in addition to what is already
in trial implementation - mostly build scripts plus some amount of scaffolding
in Java (and maybe a small bit of C). This estimate is primarily based on what
I have seen in (properly designed) netlib project.
- Build. Trial implementation modified ml module build in an offensively
straightforward way which is hardly appropriate as a proper part of the
project. In particular, packaging is set to "so" while ml.jar builds only as a
secondary target. Also, trial implementation build won't work on Windows. And
even on Linux build introduces some obscure dependencies on stuff from GCC tool
chain that needs to be installed in order for it to work.
- Design. Trial implementation makes many shortcuts that need to be addressed
to keep code maintainable. To start with, the way how off-heap data is exposed
is rather blunt, pointer is plainly passed all the way up with dumb getters
from respective storage into Vector and Matrix implementations. Even in case if
structurally this turns out the right way (which I highly doubt), naming of the
methods just doesn't feel right ("ptr").\\
\\
Another important thing is, trial implementation doesn't take care of platforms
where we decide not to implement this feature (what would be the fallback in
these cases), nor does it take care of cases when supported platform doesn't
have cblas library available (side note in these cases it would probably make
sense to somehow reuse netlib's cblas "fallback" since we have it anyway). This
was okay for a trial implementation but looks totally unacceptable in a proper
part of the project.\\
\\
The last but not the least, trial implementation is designed to explicitly work
with concrete off-heap implementations of Vector and Matrix instead of
respective interfaces. A thought should be given if this is OK to provide
public API like that (and why that would be okay?) and if it's not then how to
redesign it.
- Testing. This feature is platform and library dependent which means it should
be tested on all platforms we will decide to support plus at least on one
platform that we decide to ignore. Also, at supported platforms testing has to
be done twice, first with cblas library available and second, when it is not
there.
-----
Summing up above, as of now implementing off-heap BLAS does not look worth the
effort.
I have no reasons to expect that usage when it would be beneficial (that is,
processing off-heap data such that copying it on-heap is not feasible) is
important enough to justify spending efforts described above.
This decision may be reconsidered later when we gain more understanding about
expected usage of ML Grid.
> BLAS support for offheap vector/matrix
> --------------------------------------
>
> Key: IGNITE-5535
> URL: https://issues.apache.org/jira/browse/IGNITE-5535
> Project: Ignite
> Issue Type: Task
> Components: ml
> Reporter: Yury Babak
> Assignee: Oleg Ignatenko
> Attachments: IGNITE-5535.BLAS_support_for_offheap_vector_matrix.zip
>
>
> We want to add BLAS support for offheap stuctures. Current we implement only
> onheap version.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)