[
https://issues.apache.org/jira/browse/FALCON-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792056#comment-13792056
]
Venkatesh Seetharam edited comment on FALCON-93 at 10/10/13 11:07 PM:
----------------------------------------------------------------------
bq. Can prefix be "falcon.source." & "falcon.target." instead of just source &
target?
Yes.
bq. Looks like all tables go through the same export path. Can export & import
be avoided for external tables or does the export/import already take care of
the fact that table is external and allow you to short-circuit this?
Good question. This will be like a split-brain problem. Currently it uses the
same path for both table types. In the future, import and export can employ
hdfs snapshots so its 0 copy and is extremely fast and efficient. IMHO, that
would be the place to address this issue.
bq. Seems to be using distcp-v1. This is not desirable.
Yes. I had mentioned this in a previous comment that I could not get
FeedReolicator to work. Will try using FeedReplicator.
bq. Looks like scenario where data from multiple sources each owning a
partition getting merged in the target cluster isn't implemented, as the export
need to be specific to the partition against each of the source cluster. Please
confirm.
This is not an issue with HCatalog since the complete partition specification
is part of the URI. Say there are 2 partitions, date and region. The URI will
be: {code}hcat://localhost:9083/database/table/ds=${date}/region=${region}{code}
Makes sense?
was (Author: svenkat):
bq. Can prefix be "falcon.source." & "falcon.target." instead of just source &
target?
Yes.
bq. Looks like all tables go through the same export path. Can export & import
be avoided for external tables or does the export/import already take care of
the fact that table is external and allow you to short-circuit this?
Good question. This will be like a split-brain problem. Currently it uses the
same path for both table types. In the future, import and export can employ
hdfs snapshots so its 0 copy and is extremely fast and efficient. IMHO, that
would be the place to address this issue.
bq. Seems to be using distcp-v1. This is not desirable.
Yes. I had mentioned this in a previous comment that I could not get
FeedReolicator to work. Will try using FeedReplicator.
bq. Looks like scenario where data from multiple sources each owning a
partition getting merged in the target cluster isn't implemented, as the export
need to be specific to the partition against each of the source cluster. Please
confirm.
This is not an issue with HCatalog since the complete partition specification
is part of the URI. Say there are 2 partitions, date and region. The URI will
be: hcat://localhost:9083/database/table/ds=${date}/region=${region}
Makes sense?
> Replication to handle hive table replication
> --------------------------------------------
>
> Key: FALCON-93
> URL: https://issues.apache.org/jira/browse/FALCON-93
> Project: Falcon
> Issue Type: Sub-task
> Affects Versions: 0.3
> Reporter: Venkatesh Seetharam
> Assignee: Venkatesh Seetharam
> Attachments: FALCON-93.patch, FALCON-93-r1.patch
>
>
> Data and metadata to be replicated atomically.
--
This message was sent by Atlassian JIRA
(v6.1#6144)