[
https://issues.apache.org/jira/browse/ARROW-6206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16911922#comment-16911922
]
Micah Kornfield commented on ARROW-6206:
----------------------------------------
"iiuc arrow is a team that picked up netty derived off-heap tools naively and
demonstrated that in 2019 it's still prone to some gotchas that are a little
bit stronger than edge cases when the unit tests pass."
It is true the Java Arrow library has a steep learning curve, and could use
better documentation so new developers aren't bitten. There has also been less
focus on the non-core Java libraries (i.e. adapters) until recently, and we
need to do something distinguish the maturity between them so these types of
things are less surprising. If you have suggestions please let us know. I
would suggest perhaps sending mail to the dev@ or user@ mailing lists, since
generally more people monitor those then conversations on JIRA. FWIW, the core
library was adapted from Apache Drill and used by Dremio in their product, both
of which, iiuc are long running processes that provide competitive analytic
performance (I don't know how prone to resource leakage they are are).
"and gave me the confidence to assume this will do the job faster than python.
and so began this thread on 800+ megabytes of data."
I'm sorry you ran into this. If think you are working into the python
ecosystem Turbodbc might be your best bet of getting data into Arrow. In
general, most of the python code is just a facade on top of C++ so I would
expect it to be pretty performant. Please discuss on the mailing list or
continue to file JIRAs if you are seeing unexpected performance/behavior. We
want to know.
"you should really put a consumer facing notice on where NIO is and is not
present."
Would you mind opening up a JIRA/Pull Request describing how you think it is
best to publicize it?
> [Java][Docs] Document environment variables/java properties
> -----------------------------------------------------------
>
> Key: ARROW-6206
> URL: https://issues.apache.org/jira/browse/ARROW-6206
> Project: Apache Arrow
> Issue Type: Improvement
> Components: Documentation, Java
> Reporter: Micah Kornfield
> Assignee: Ji Liu
> Priority: Major
> Labels: pull-request-available
> Fix For: 0.15.0
>
> Time Spent: 1.5h
> Remaining Estimate: 0h
>
> Specifically, "-Dio.netty.tryReflectionSetAccessible=true" for JVMs >= 9 and
> BoundsChecking/NullChecking for get.
>
>
--
This message was sent by Atlassian Jira
(v8.3.2#803003)