[
https://issues.apache.org/jira/browse/SQOOP-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14230615#comment-14230615
]
Ryan Blue commented on SQOOP-1744:
----------------------------------
I see that this is a subtask of "Kite Connector Support", so kindly ignore my
last comment. :)
The reason why [~stanleyxu2005] notes that merge isn't supported is that merge
is a HDFS dataset concept. We can write to temporary locations and then merge
the data in by moving files in HDFS. That ensures that all of the data is
written to HDFS before we commit all of it by moving the files. For HBase,
writes take place as soon as the data is sent to the server (usually batched
and sent when a flush occurs). We need to clearly define what should happen
when a job has failures.
If none of the data for that job should be in HBase, then we need to stage all
of the data and update HBase at once using
[{{HFileOutputFormat}}|https://hbase.apache.org/book/arch.bulk.load.html]. I
think this is the most reasonable approach, but it requires an update to Kite.
> TO-side: Write data to HBase
> ----------------------------
>
> Key: SQOOP-1744
> URL: https://issues.apache.org/jira/browse/SQOOP-1744
> Project: Sqoop
> Issue Type: Sub-task
> Components: connectors
> Reporter: Qian Xu
> Assignee: Qian Xu
> Fix For: 1.99.5
>
>
> Propose to write data into HBase. Note that different to HDFS, HBase is
> append only. Merge does not work for HBase.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)