[ 
https://issues.apache.org/jira/browse/IGNITE-5535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16200226#comment-16200226
 ] 

Oleg Ignatenko commented on IGNITE-5535:
----------------------------------------

I implemented trial code for off-heap BLAS on linux-x86_64 in [branch 
ignite-5535-1|https://github.com/gridgain/apache-ignite/tree/ignite-5535-1] 
(also attached as a patch here: 
[^IGNITE-5535.BLAS_support_for_offheap_vector_matrix.zip]).

This code was benchmarked against on-heap BLAS working with netlib. Benchmarks 
show that performance is about the same. Based on that unexpected outcome I 
took a closer look at netlib and discovered that its implementation uses JNI in 
a way that doesn't require copying data from Java heap (method 
[GetPrimitiveArrayCritical|https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/functions.html#GetPrimitiveArrayCritical_ReleasePrimitiveArrayCritical]).

This means that our implementation won't speedup on-heap BLAS handled by 
netlib. This in turn limits expected benefit solely to off-heap data processing 
and more specifically, to cases when copying data to heap (for its further 
processing with netlib) would be not feasible for some reasons.

-----

In order to account for the change in our expectations (which initially assumed 
unconditional improvement) I closer inspected expected implementation efforts:\\
\\

- Coding. I expect we need to write 10-15K code in addition to what is already 
in trial implementation - mostly build scripts plus some amount of scaffolding 
in Java (and maybe a small bit of C). This estimate is primarily based on what 
I have seen in (properly designed) netlib project.

- Build. Trial implementation modified ml module build in an offensively 
straightforward way which is hardly appropriate as a proper part of the 
project. In particular, packaging is set to "so" while ml.jar builds only as a 
secondary target. Also, trial implementation build won't work on Windows. And 
even on Linux build introduces some obscure dependencies on stuff from GCC tool 
chain that needs to be installed in order for it to work.

- Design. Trial implementation makes many shortcuts that need to be addressed 
to keep code maintainable. To start with, the way how off-heap data is exposed 
is rather blunt, pointer is plainly passed all the way up with dumb getters 
from respective storage into Vector and Matrix implementations. Even in case if 
structurally this turns out the right way (which I highly doubt), naming of the 
methods just doesn't feel right ("ptr").\\
\\
Another important thing is, trial implementation doesn't take care of platforms 
where we decide not to implement this feature (what would be the fallback in 
these cases), nor does it take care of cases when supported platform doesn't 
have cblas library available (side note in these cases it would probably make 
sense to somehow reuse netlib's cblas "fallback" since we have it anyway). This 
was okay for a trial implementation but looks totally unacceptable in a proper 
part of the project.\\
\\
The last but not the least, trial implementation is designed to explicitly work 
with concrete off-heap implementations of Vector and Matrix instead of 
respective interfaces. A thought should be given if this is OK to provide 
public API like that (and why that would be okay?) and if it's not then how to 
redesign it.

- Testing. This feature is platform and library dependent which means it should 
be tested on all platforms we will decide to support plus at least on one 
platform that we decide to ignore. Also, at supported platforms testing has to 
be done twice, first with cblas library available and second, when it is not 
there.

-----

Summing up above, as of now implementing off-heap BLAS does not look worth the 
effort.

I have no reasons to expect that usage when it would be beneficial (that is, 
processing off-heap data such that copying it on-heap is not feasible) is 
important enough to justify spending efforts described above.

This decision may be reconsidered later when we gain more understanding about 
expected usage of ML Grid.

> BLAS support for offheap vector/matrix
> --------------------------------------
>
>                 Key: IGNITE-5535
>                 URL: https://issues.apache.org/jira/browse/IGNITE-5535
>             Project: Ignite
>          Issue Type: Task
>          Components: ml
>            Reporter: Yury Babak
>            Assignee: Oleg Ignatenko
>         Attachments: IGNITE-5535.BLAS_support_for_offheap_vector_matrix.zip
>
>
> We want to add BLAS support for offheap stuctures. Current we implement only 
> onheap version.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to