[jira] [Comment Edited] (FALCON-93) Replication to handle hive table replication

Venkatesh Seetharam (JIRA) Thu, 10 Oct 2013 16:09:37 -0700

    [ 
https://issues.apache.org/jira/browse/FALCON-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792056#comment-13792056
 ]


Venkatesh Seetharam edited comment on FALCON-93 at 10/10/13 11:07 PM:
----------------------------------------------------------------------

bq. Can prefix be "falcon.source." & "falcon.target." instead of just source & 
target?
Yes.

bq. Looks like all tables go through the same export path. Can export & import 
be avoided for external tables or does the export/import already take care of 
the fact that table is external and allow you to short-circuit this?
Good question. This will be like a split-brain problem. Currently it uses the 
same path for both table types. In the future, import and export can employ 
hdfs snapshots so its 0 copy and is extremely fast and efficient. IMHO, that 
would be the place to address this issue.

bq. Seems to be using distcp-v1. This is not desirable.
Yes. I had mentioned this in a previous comment that I could not get 
FeedReolicator to work. Will try using FeedReplicator.

bq. Looks like scenario where data from multiple sources each owning a 
partition getting merged in the target cluster isn't implemented, as the export 
need to be specific to the partition against each of the source cluster. Please 
confirm.
This is not an issue with HCatalog since the complete partition specification 
is part of the URI. Say there are 2 partitions, date and region. The URI will 
be: {code}hcat://localhost:9083/database/table/ds=${date}/region=${region}{code}

Makes sense?


was (Author: svenkat):
bq. Can prefix be "falcon.source." & "falcon.target." instead of just source & 
target?
Yes.

bq. Looks like all tables go through the same export path. Can export & import 
be avoided for external tables or does the export/import already take care of 
the fact that table is external and allow you to short-circuit this?
Good question. This will be like a split-brain problem. Currently it uses the 
same path for both table types. In the future, import and export can employ 
hdfs snapshots so its 0 copy and is extremely fast and efficient. IMHO, that 
would be the place to address this issue.

bq. Seems to be using distcp-v1. This is not desirable.
Yes. I had mentioned this in a previous comment that I could not get 
FeedReolicator to work. Will try using FeedReplicator.

bq. Looks like scenario where data from multiple sources each owning a 
partition getting merged in the target cluster isn't implemented, as the export 
need to be specific to the partition against each of the source cluster. Please 
confirm.
This is not an issue with HCatalog since the complete partition specification 
is part of the URI. Say there are 2 partitions, date and region. The URI will 
be: hcat://localhost:9083/database/table/ds=${date}/region=${region}

Makes sense?

> Replication to handle hive table replication
> --------------------------------------------
>
>                 Key: FALCON-93
>                 URL: https://issues.apache.org/jira/browse/FALCON-93
>             Project: Falcon
>          Issue Type: Sub-task
>    Affects Versions: 0.3
>            Reporter: Venkatesh Seetharam
>            Assignee: Venkatesh Seetharam
>         Attachments: FALCON-93.patch, FALCON-93-r1.patch
>
>
> Data and metadata to be replicated atomically.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Comment Edited] (FALCON-93) Replication to handle hive table replication

Reply via email to