[ https://issues.apache.org/jira/browse/ARROW-9513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17303899#comment-17303899 ]
Liya Fan commented on ARROW-9513: --------------------------------- [~benmosher] Thanks for your feedback. Your comments about {{allocator}} makes sense to me. Do you want to provide a PR to improve the documentation? > [Java] Improve documentation in regards to basic-usage / memory-management > -------------------------------------------------------------------------- > > Key: ARROW-9513 > URL: https://issues.apache.org/jira/browse/ARROW-9513 > Project: Apache Arrow > Issue Type: Wish > Components: Documentation, Java > Affects Versions: 0.17.1 > Reporter: sascha schnug > Priority: Minor > > I'm experimenting with Arrow using Java, C+ +and Python+ IPC format > (Bytestream, File) and Parquet: I am struggling alot on the Java-side, even > after looking for external resources and some code-reading within the > dev-repository . > > Observing the state of the documentation, there seems to be a strong favour > in regards to C++ and Python, which is not surprising. The Java part however, > is hard to work with (at least for me; it might be possible that i'm the > problem though). Sadly the Java interface is also the one, which is the most > diverging from what people would usually do in Java. > > Acknowledging the user-guide like documentation from > [repo/java|https://github.com/apache/arrow/tree/master/java#getting-started] > (-i don't think this is referenced in the docs and it might only be > referenced by the java-part of the repository > looks "hidden" as the > Java-link in the docs points to Javadoc-based content- -> known issue: > ARROW-9364 ) and it's warnings about VectorSchemaRoot being special and > temporary and also reading [this external > article|https://www.infoq.com/articles/apache-arrow-java] > which also talks about manual memory-management i'm still struggling with a > very simple use-case: > > - create and fill VectorSchemaRoot > - write VectorSchemaRoot in IPC format to disk > - read VectorSchemaRoot from IPC format from disk > - INTO some out of scope object not owned by the reader! > > I won't put example code here, but refer to my StackOverflow question > showing the problem of mine: > [StackOverflow|https://stackoverflow.com/q/62938237/2320035] > > Something about memory-ownership is not working as expected for me. > > No matter what tests (dev-repo) or article (e.g. the second link above) i > read, their examples did not help me here as those all are *processing* the > data read in *within the reader-scope* (mostly simple elementwise check), > while i want to read into some *global* object which outlives the > reader-object (see my code on SO or the second link: printing out read data > works as long as the reader is open). > > The article above also says: > > {code:java} > A vector is managed by one allocator. We say that the allocator owns the > buffer backing the vector. Vector ownership can be transferred from one > allocator to another. > {code} > > But how exactly would i populate an empty VectorSchemaRoot (of my class) > with whatever i read in, surviving closing the reader? I experiment with > VectorLoad and VectorUnload, including usage of the only call i found which > has "ownership" in his docstring (batch.cloneWithTransfer), but no success. > And even if working, the Java-based RecordBatch > [link|https://arrow.apache.org/docs/java/] which would be the one using for > this looks completely different then what Pythons does look like > [link|https://arrow.apache.org/docs/python/generated/pyarrow.RecordBatch.htm]). > > > Should i be able to see my problem given the documentation? Is there > anything else to read? (I know that there must be in this regards within some > Flight / Gandiva project-code, but i did not find it yet). > > Or would it be completely wrong to keep VectorSchemaRoot as core-object to > handle all my data? > > Feel free to close this issue if you think, that documentation is *not* > incomplete. > > Thanks, > Sascha -- This message was sent by Atlassian Jira (v8.3.4#803005)