[jira] [Updated] (HIVE-7973) Hive Replication Support

Sushanth Sowmyan (JIRA) Fri, 17 Apr 2015 12:07:59 -0700

     [ 
https://issues.apache.org/jira/browse/HIVE-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sushanth Sowmyan updated HIVE-7973:
-----------------------------------
    Description: 
A need for replication is a common one in many database management systems, and 
it's important for hive to evolve support for such a tool as part of its 
ecosystem. Hive already supports an EXPORT and IMPORT command, which can be 
used to dump out tables, distcp them to another cluster, and and import/create 
from that. If we had a mechanism by which exports and imports could be 
automated, it establishes the base with which replication can be developed.

One place where this kind of automation can be developed is with aid of the 
HiveMetaStoreEventHandler mechanisms, to generate notifications when certain 
changes are committed to the metastore, and then translate those notifications 
to export actions, distcp actions and import actions on another import action.

Part of that already exists is with the Notification system that is part of 
hcatalog-server-extensions. Initially, this was developed to be able to trigger 
a JMS notification, which an Oozie workflow can use to can start off actions 
keyed on the finishing of a job that used HCatalog to write to a table. While 
this currently lives under hcatalog, the primary reason for its existence has a 
scope well past hcatalog alone, and can be used as-is without the use of 
HCatalog IF/OF. This can be extended, with the help of a library which does 
that aforementioned translation. I also think that these sections should live 
in a core hive module, rather than being tucked away inside hcatalog.

Once we have rudimentary support for table & partition replication, we can then 
move on to further requirements of replication, such as metadata replications 
(such as replication of changes to roles/etc), and/or optimize away the 
requirement to distcp and use webhdfs instead, etc.

This Story tracks all the bits that go into development of such a system - I'll 
create multiple smaller tasks inside this as we go on.

Please also see HIVE-10264 for documentation-related links for this, and 
https://cwiki.apache.org/confluence/display/Hive/HiveReplicationDevelopment for 
associated wiki (currently in progress)


  was:
A need for replication is a common one in many database management systems, and 
it's important for hive to evolve support for such a tool as part of its 
ecosystem. Hive already supports an EXPORT and IMPORT command, which can be 
used to dump out tables, distcp them to another cluster, and and import/create 
from that. If we had a mechanism by which exports and imports could be 
automated, it establishes the base with which replication can be developed.

One place where this kind of automation can be developed is with aid of the 
HiveMetaStoreEventHandler mechanisms, to generate notifications when certain 
changes are committed to the metastore, and then translate those notifications 
to export actions, distcp actions and import actions on another import action.

Part of that already exists is with the Notification system that is part of 
hcatalog-server-extensions. Initially, this was developed to be able to trigger 
a JMS notification, which an Oozie workflow can use to can start off actions 
keyed on the finishing of a job that used HCatalog to write to a table. While 
this currently lives under hcatalog, the primary reason for its existence has a 
scope well past hcatalog alone, and can be used as-is without the use of 
HCatalog IF/OF. This can be extended, with the help of a library which does 
that aforementioned translation. I also think that these sections should live 
in a core hive module, rather than being tucked away inside hcatalog.

Once we have rudimentary support for table & partition replication, we can then 
move on to further requirements of replication, such as metadata replications 
(such as replication of changes to roles/etc), and/or optimize away the 
requirement to distcp and use webhdfs instead, etc.

This Story tracks all the bits that go into development of such a system - I'll 
create multiple smaller tasks inside this as we go on.



> Hive Replication Support
> ------------------------
>
>                 Key: HIVE-7973
>                 URL: https://issues.apache.org/jira/browse/HIVE-7973
>             Project: Hive
>          Issue Type: Bug
>          Components: Import/Export
>            Reporter: Sushanth Sowmyan
>
> A need for replication is a common one in many database management systems, 
> and it's important for hive to evolve support for such a tool as part of its 
> ecosystem. Hive already supports an EXPORT and IMPORT command, which can be 
> used to dump out tables, distcp them to another cluster, and and 
> import/create from that. If we had a mechanism by which exports and imports 
> could be automated, it establishes the base with which replication can be 
> developed.
> One place where this kind of automation can be developed is with aid of the 
> HiveMetaStoreEventHandler mechanisms, to generate notifications when certain 
> changes are committed to the metastore, and then translate those 
> notifications to export actions, distcp actions and import actions on another 
> import action.
> Part of that already exists is with the Notification system that is part of 
> hcatalog-server-extensions. Initially, this was developed to be able to 
> trigger a JMS notification, which an Oozie workflow can use to can start off 
> actions keyed on the finishing of a job that used HCatalog to write to a 
> table. While this currently lives under hcatalog, the primary reason for its 
> existence has a scope well past hcatalog alone, and can be used as-is without 
> the use of HCatalog IF/OF. This can be extended, with the help of a library 
> which does that aforementioned translation. I also think that these sections 
> should live in a core hive module, rather than being tucked away inside 
> hcatalog.
> Once we have rudimentary support for table & partition replication, we can 
> then move on to further requirements of replication, such as metadata 
> replications (such as replication of changes to roles/etc), and/or optimize 
> away the requirement to distcp and use webhdfs instead, etc.
> This Story tracks all the bits that go into development of such a system - 
> I'll create multiple smaller tasks inside this as we go on.
> Please also see HIVE-10264 for documentation-related links for this, and 
> https://cwiki.apache.org/confluence/display/Hive/HiveReplicationDevelopment 
> for associated wiki (currently in progress)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7973) Hive Replication Support

Reply via email to