[ 
https://issues.apache.org/jira/browse/ARROW-1920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16289496#comment-16289496
 ] 

ASF GitHub Bot commented on ARROW-1920:
---------------------------------------

jcrist commented on issue #1418: ARROW-1920 [C++/Python] Add ORC Reader
URL: https://github.com/apache/arrow/pull/1418#issuecomment-351447660
 
 
   A few high level notes:
   
   - Following the example in #1026, I put the c++ code in 
`src/arrow/adapters/orc/*`. This is nice, as Arrow has a more active community 
than `apache/orc`. However, it does mean that the underlying classes in liborc 
(e.g. `ColumnStatistics`, `ReaderOptions`, etc...) can't be exposed as part of 
the api. An alternative version would add arrow support in `apache/orc`, and 
wrap that in `pyarrow`. My main goal is to add orc reading support to pyarrow, 
so whatever library structure best allows that is fine with me.
   
   - This is the first time I've written c++ in 10 years, I've probably made 
some naive mistakes. Criticism welcome :).
   
   - Since liborc and pyarrow share numerous dependencies, it's important that 
the versions of these dependencies match. As such, I haven't added a 
`FindORC.cmake` script - a custom build of `liborc.a` is required to ensure the 
dependencies match.
   
   - I'm not sure how to add tests here. There are numerous example files in 
`apache/orc` that I used to test locally, but I'm not sure if/how we can 
integrate those into the tests here.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> Add support for reading ORC files
> ---------------------------------
>
>                 Key: ARROW-1920
>                 URL: https://issues.apache.org/jira/browse/ARROW-1920
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: C++, Python
>            Reporter: Jim Crist
>              Labels: pull-request-available
>
> Would be nice to be able to read ORC files in pyarrow, similar to the already 
> existing parquet support.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to