[ 
https://issues.apache.org/jira/browse/ORC-508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16853448#comment-16853448
 ] 

Owen O'Malley commented on ORC-508:
-----------------------------------

The main problem is going to be that a couple Hadoop classes are in the API. We 
can't remove them without breaking compatibility. I'd suggest making a new 
module (orc-hadoop-proxy?) that contains a few classes that satisfy the 
required contract.

I assume that you only care about core and not mapreduce or tools.

Classes that I know about:
* Configuration
* FileSystem
* Path
* VersionInfo

You would then be able to add the module into the classpath instead of Hadoop 
and have the rest of the ORC library work as intended.

> Add a reader/writer that does not depend on Hadoop FileSystem
> -------------------------------------------------------------
>
>                 Key: ORC-508
>                 URL: https://issues.apache.org/jira/browse/ORC-508
>             Project: ORC
>          Issue Type: Improvement
>          Components: Java
>            Reporter: Ismaël Mejía
>            Priority: Major
>
> It seems that the default implementation classes of Orc today depend on 
> Hadoop FS objects to write. This is not ideal for APIs that do not rely on 
> Hadoop. For some context I was taking a look at adding support for Apache 
> Beam, but Beam's API supports multiple filesystems with a more generic 
> abstraction that relies on Java's Channels and Streams APIs. That delegate 
> directly to Distributed FS e.g. Google Cloud Storage, Amazon S3, etc. It 
> would be really nice to have such support in the core implementation and to 
> maybe split the hadoop depending implementation into its own module in the 
> future.
>  
>  
> After a look at some parts of the `orc-core`



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to