[ 
https://issues.apache.org/jira/browse/ARROW-1163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16242190#comment-16242190
 ] 

Lu Qi  commented on ARROW-1163:
-------------------------------

Hi,Philipp Moritz,
I've been working on reading and writing Tensor in Java for several weeks. I've 
got Tensor structure like this:
Class Tensor{ private float[] storage; private int[] shape }
I used JNI to leverage plasma C++ client . One good thing is when writing 
tensor ,there is 
"getPrimitiveArrayCritical" method which gets the address in Java heap (based 
on vm impletation),
thus I can construct Tensor in C++ easily without copying, although it stops GC 
in this process, but 
plasma writing is non blocking. On the other side of the world, when reading 
tensor , I need to copy 
the share memory into java heap, this will cost time.  So, in order to save 
reading time , pure Java 
client may be a good choice. 

As to pure Java client , may be we can use jni to get fd first and construct a 
FileDescriptor .
https://stackoverflow.com/questions/4845122/using-a-numbered-file-descriptor-from-java
 


> [Plasma] Java client for Plasma
> -------------------------------
>
>                 Key: ARROW-1163
>                 URL: https://issues.apache.org/jira/browse/ARROW-1163
>             Project: Apache Arrow
>          Issue Type: New Feature
>            Reporter: Philipp Moritz
>
> We should start thinking about how a Java client for plasma would look like. 
> Given the focus of arrow to support Python, C++ and Java really well, it is 
> the next important target after Python and C++.
> My preliminary thoughts on it are the following ones: We can either go with 
> JNI and wrap the C++ client or (in my opinion preferable) write a pure Java 
> client. It would communicate with the Plasma store via Java flatbuffers over 
> sockets.
> It seems that the only thing blocking a pure Java client at the moment is the 
> way we ship file descriptors for the memory mapped files between store and 
> client (see the file fling.cc in the Plasma repo). We would need to get rid 
> of that because there is no pure Java API that allows transferring file 
> descriptors over a process boundary. So the way to transfer memory mapped 
> files over process boundaries then is probably to use the file system and 
> keep the memory mapped files in the file system instead of unlinking them 
> immediately (as we do at the moment), so they can be opened by the client 
> process via their path.
> The challenge in this case is how to clean the files up and make sure they 
> are not lying around if the plasma store crashes. One option is to store the 
> plasma store PID with the file (i.e. as part of the file name) and let the 
> plasma store clean them up the next time it is started); maybe there is OS 
> level support for temporary files we can reuse.
> I probably won't get to this for a while, so if anybody needs this or has 
> free cycles, they should feel free to chime in. Also opinions on the design 
> are appreciated!
> -- Philipp.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to