[ https://issues.apache.org/jira/browse/HIVE-2612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13188173#comment-13188173 ]
He Yongqiang commented on HIVE-2612: ------------------------------------ yeah. sure. internally we actually had some discussions around this. I can write something about it. Right now, our main concern is that the cluster feature may be not so useful for most people. So we want to know what other people think about the potential incompatibilities that this could introduce. actually sent out an discussion to dev@, copy it here: " We are planning to make hive run across multiple data centers (physical clusters). We prefer to use hive metastore to provide a unified namespace. Tables/partitions can exist in more than one cluster. And one cluster is defined as a primary cluster. A primary cluster is a table level property. A table T1's primary cluster is C1 meaning :1) C1 contains all data that is available in all other clusters. 2) write is only allowed in this cluster for table C1. but need to allow exceptions here 3) new partitions are only allowed to be created in C1. 4) all data changes to T1 happened in the primary cluster should be replicated to other clusters if there are any secondary clusters. but there should be a conf to disable it as there are some exception situations. The first thing that needs to be done is to make hive metastore have a concept of cluster. And that also means all thrift communication calls to metastore need to provide a cluster parameter. So we have there options here: 1) add a cluster parameter to existing thrift interfaces or 2) add new interfaces which do exactly the same set of functionalities as old ones but using a different name (use _on_cluster suffifx maybe?) and have a cluster parameter or 3) overwrite database name for the purpose of cluster name. And allow a table co-exist in multiple databases. But that require to promote table to top level citizen, and degrade database. For example, "show tables" used to scan all tables in current db, but now need to scan all tables in all databases. We would like to get more ideas about which one to choose, and we are definitely open to other alternatives that we missed here. We are also looking for other systems that have solved similar problems. If anyone knows such a system, we would like to know. Appreciate that! " > support hive table/partitions coexistes in more than one clusters > ----------------------------------------------------------------- > > Key: HIVE-2612 > URL: https://issues.apache.org/jira/browse/HIVE-2612 > Project: Hive > Issue Type: New Feature > Reporter: He Yongqiang > Assignee: He Yongqiang > > 1) add cluster object into hive metastore > 2) each partition/table has a creation cluster and a list of living clusters, > and also data location in each cluster -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira