[
https://issues.apache.org/jira/browse/ORC-151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15886915#comment-15886915
]
Owen O'Malley commented on ORC-151:
-----------------------------------
[~istvan] It would be a lot of work to completely remove the Hadoop
dependencies. If you want to investigate it, you should probably open a new
jira. In particular, you'd need to:
* Remove the hadoop dependence from hive-storage, which would require:
* Providing alternatives to the uses of Writable without breaking
compatibility. At the very least, you'll need a new implementation of
DecimalColumnVector, DateColumnVector, and TimestampColumnVector.
* You'll need to replace the functionality of WritableUtils, but that should
be pretty easy.
* If you are trying to get rid of problematic libraries, you really should
remove the use of guava from hive-storage too. I thought I had removed them at
one point, but they are still there.
* You can't change the definition of orc-core, which would break compatibility,
but you could create a new module (orc-kernel?) that houses the hadoop-clean
code.
* PhysicalWriter already avoids the FileSystem on the write side, you could try
making a PhysicalReader for the read path.
* You'll need to deal with Configuration in a backwards compatible way.
All without introducing a performance penalty on the Hadoop users. If your goal
is just reducing the size of the jar, it doesn't seem worth the work.
> Cut down on the size of the tools jar by excluding more
> -------------------------------------------------------
>
> Key: ORC-151
> URL: https://issues.apache.org/jira/browse/ORC-151
> Project: Orc
> Issue Type: Bug
> Reporter: Owen O'Malley
>
> It would be good to cut down the size of the tools jar by excluding more of
> the transitive dependencies, especially through the hadoop jars.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)