[ 
https://issues.apache.org/jira/browse/HIVE-14007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15742914#comment-15742914
 ] 

Owen O'Malley commented on HIVE-14007:
--------------------------------------

.bq
The other thing I think we need community wide clarity on before you rip out 
orc is how we’re going to keep developing hive afterwards. Right now there’s a 
cyclic dependency. Hive -> ORC -> Hive - because of a shared storage api.

There is agreement within Hive to release the storage-api independently from 
Hive.  That would break the cycle and allow a non-cyclic release process. I'll 
file a Hive jira to do that work. Avoiding have two copies of code makes the 
whole ecosystem stronger by making sure that fixes get applied everywhere. I'd 
suggest leaving storage-api in the Hive source tree rather than making its own 
git repository. 

.bq
There are features that touch all three. And it turns out these are more 
frequent than expected. 

They come in waves. In the last three months, there have been 2 changes to 
storage-api.
Most of the patches are in either storage-api or ORC.  For example, HIVE-14453 
only touches ORC.

.bq
How do you propose to handle development and release of these features given 
the cyclic dependency? How do you work out feature branches/ snapshots?

For changes that touch one or the other, you'd commit the relevant change and 
release either storage-api or ORC and have a jira that updates the version in 
Hive. In the worst case, where the change spreads among the three artifacts, 
you would:

* commit to storage-api & ORC
* release them
* upgrade the pom in Hive

.bq
If a successful feature commit requires sequential hive and orc releases, then 
that means minimum several months before commit and that's not great. How will 
this be done?

No, ORC releases typically take 3 days. Storage API is much simpler and should 
also take 3 days. By being much smaller and more focused, they are much more 
nimble. Furthermore, the two votes could completely overlap, so the total time 
to get the change into Hive would be roughly 3 days. 

.bq
Looking over the PMC and committer lists in ORC it looks like many people 
working on ACID, vectorization or llap will lose the ability to do what they 
are doing today with this change.

When we set up the ORC project, we were pretty inclusive in the committer list 
and we continue to add new committers and PMC members. I'll take at the 
contributors to the Hive ORC module to look for new committers.

> Replace ORC module with ORC release
> -----------------------------------
>
>                 Key: HIVE-14007
>                 URL: https://issues.apache.org/jira/browse/HIVE-14007
>             Project: Hive
>          Issue Type: Bug
>          Components: ORC
>    Affects Versions: 2.2.0
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>             Fix For: 2.2.0
>
>         Attachments: HIVE-14007.patch, HIVE-14007.patch, HIVE-14007.patch, 
> HIVE-14007.patch
>
>
> This completes moving the core ORC reader & writer to the ORC project.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to