[
https://issues.apache.org/jira/browse/ORC-508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16853448#comment-16853448
]
Owen O'Malley commented on ORC-508:
-----------------------------------
The main problem is going to be that a couple Hadoop classes are in the API. We
can't remove them without breaking compatibility. I'd suggest making a new
module (orc-hadoop-proxy?) that contains a few classes that satisfy the
required contract.
I assume that you only care about core and not mapreduce or tools.
Classes that I know about:
* Configuration
* FileSystem
* Path
* VersionInfo
You would then be able to add the module into the classpath instead of Hadoop
and have the rest of the ORC library work as intended.
> Add a reader/writer that does not depend on Hadoop FileSystem
> -------------------------------------------------------------
>
> Key: ORC-508
> URL: https://issues.apache.org/jira/browse/ORC-508
> Project: ORC
> Issue Type: Improvement
> Components: Java
> Reporter: Ismaël Mejía
> Priority: Major
>
> It seems that the default implementation classes of Orc today depend on
> Hadoop FS objects to write. This is not ideal for APIs that do not rely on
> Hadoop. For some context I was taking a look at adding support for Apache
> Beam, but Beam's API supports multiple filesystems with a more generic
> abstraction that relies on Java's Channels and Streams APIs. That delegate
> directly to Distributed FS e.g. Google Cloud Storage, Amazon S3, etc. It
> would be really nice to have such support in the core implementation and to
> maybe split the hadoop depending implementation into its own module in the
> future.
>
>
> After a look at some parts of the `orc-core`
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)