[
https://issues.apache.org/jira/browse/ARROW-1163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16243420#comment-16243420
]
Philipp Moritz edited comment on ARROW-1163 at 11/8/17 5:59 AM:
----------------------------------------------------------------
That makes sense for now and I agree it's a little sad; for the future maybe
you can get some insights from https://github.com/deeplearning4j/deeplearning4j
on how to write the Tensor class in the "right" way; unfortunately Java doesn't
really have a long tradition of scientific computing like Python has so there
is no good standard Tensor classes like numpy.
Edit: This is also an opportunity for Arrow, if we had a good Java tensor class
it could be widely used because of the increasing importance of deep learning.
Another project to look at is https://github.com/intel-analytics/BigDL. We also
wrote our own in the past:
https://github.com/amplab/SparkNet/blob/master/src/main/scala/libs/NDArray.scala
and
https://github.com/amplab/SparkNet/blob/master/src/main/java/libs/JavaNDArray.java
to interop with Caffe and TensorFlow, but it might not be too useful for
shared memory.
was (Author: pcmoritz):
That makes sense for now and I agree it's a little sad; for the future maybe
you can get some insights from https://github.com/deeplearning4j/deeplearning4j
on how to write the Tensor class in the "right" way; unfortunately Java doesn't
really have a long tradition of scientific computing like Python has so there
is no good standard Tensor classes like numpy.
> [Plasma] Java client for Plasma
> -------------------------------
>
> Key: ARROW-1163
> URL: https://issues.apache.org/jira/browse/ARROW-1163
> Project: Apache Arrow
> Issue Type: New Feature
> Reporter: Philipp Moritz
>
> We should start thinking about how a Java client for plasma would look like.
> Given the focus of arrow to support Python, C++ and Java really well, it is
> the next important target after Python and C++.
> My preliminary thoughts on it are the following ones: We can either go with
> JNI and wrap the C++ client or (in my opinion preferable) write a pure Java
> client. It would communicate with the Plasma store via Java flatbuffers over
> sockets.
> It seems that the only thing blocking a pure Java client at the moment is the
> way we ship file descriptors for the memory mapped files between store and
> client (see the file fling.cc in the Plasma repo). We would need to get rid
> of that because there is no pure Java API that allows transferring file
> descriptors over a process boundary. So the way to transfer memory mapped
> files over process boundaries then is probably to use the file system and
> keep the memory mapped files in the file system instead of unlinking them
> immediately (as we do at the moment), so they can be opened by the client
> process via their path.
> The challenge in this case is how to clean the files up and make sure they
> are not lying around if the plasma store crashes. One option is to store the
> plasma store PID with the file (i.e. as part of the file name) and let the
> plasma store clean them up the next time it is started); maybe there is OS
> level support for temporary files we can reuse.
> I probably won't get to this for a while, so if anybody needs this or has
> free cycles, they should feel free to chime in. Also opinions on the design
> are appreciated!
> -- Philipp.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)