[ 
https://issues.apache.org/jira/browse/ORC-151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15886915#comment-15886915
 ] 

Owen O'Malley commented on ORC-151:
-----------------------------------

[~istvan] It would be a lot of work to completely remove the Hadoop 
dependencies. If you want to investigate it, you should probably open a new 
jira. In particular, you'd need to:
* Remove the hadoop dependence from hive-storage, which would require:
  * Providing alternatives to the uses of Writable without breaking 
compatibility. At the very least, you'll need a new implementation of 
DecimalColumnVector, DateColumnVector, and TimestampColumnVector.
  * You'll need to replace the functionality of WritableUtils, but that should 
be pretty easy.
* If you are trying to get rid of problematic libraries, you really should 
remove the use of guava from hive-storage too. I thought I had removed them at 
one point, but they are still there.
* You can't change the definition of orc-core, which would break compatibility, 
but you could create a new module (orc-kernel?) that houses the hadoop-clean 
code. 
* PhysicalWriter already avoids the FileSystem on the write side, you could try 
making a PhysicalReader for the read path.
* You'll need to deal with Configuration in a backwards compatible way.

All without introducing a performance penalty on the Hadoop users. If your goal 
is just reducing the size of the jar, it doesn't seem worth the work.

> Cut down on the size of the tools jar by excluding more
> -------------------------------------------------------
>
>                 Key: ORC-151
>                 URL: https://issues.apache.org/jira/browse/ORC-151
>             Project: Orc
>          Issue Type: Bug
>            Reporter: Owen O'Malley
>
> It would be good to cut down the size of the tools jar by excluding more of 
> the transitive dependencies, especially through the hadoop jars.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to