[ 
https://issues.apache.org/jira/browse/ARROW-6206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16911922#comment-16911922
 ] 

Micah Kornfield commented on ARROW-6206:
----------------------------------------

"iiuc arrow is a team that picked up netty derived off-heap tools naively and 
demonstrated that in 2019 it's still prone to some gotchas that are a little 
bit stronger than edge cases when the unit tests pass."

It is true the Java Arrow library has a steep learning curve, and could use 
better documentation so new developers aren't bitten.  There has also been less 
focus on the non-core Java libraries (i.e. adapters) until recently, and we 
need to do something distinguish the maturity between them so these types of 
things are less surprising.  If you have suggestions please let us know.  I 
would suggest perhaps sending mail to the dev@ or user@ mailing lists, since 
generally more people monitor those then conversations on JIRA.  FWIW, the core 
library was adapted from Apache Drill and used by Dremio in their product, both 
of which, iiuc are long running processes that provide competitive analytic 
performance (I don't know how prone to resource leakage they are are).

 

"and gave me the confidence to assume this will do the job faster than python. 
and so began this thread on 800+ megabytes of data."

I'm sorry you ran into this.  If think you are working into the python 
ecosystem Turbodbc might be your best bet of getting data into Arrow.  In 
general, most of the python code is just a facade on top of C++ so I would 
expect it to be pretty performant.  Please discuss on the mailing list or 
continue to file JIRAs if you are seeing unexpected performance/behavior.  We 
want to know.

 

"you should really put a consumer facing notice on where NIO is and is not 
present."

Would you mind opening up a JIRA/Pull Request describing how you think it is 
best to publicize it?

 

 

> [Java][Docs] Document environment variables/java properties
> -----------------------------------------------------------
>
>                 Key: ARROW-6206
>                 URL: https://issues.apache.org/jira/browse/ARROW-6206
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Documentation, Java
>            Reporter: Micah Kornfield
>            Assignee: Ji Liu
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.15.0
>
>          Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Specifically, "-Dio.netty.tryReflectionSetAccessible=true" for JVMs >= 9 and 
> BoundsChecking/NullChecking for get.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Reply via email to