[
https://issues.apache.org/jira/browse/SQOOP-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14272111#comment-14272111
]
Veena Basavaraj commented on SQOOP-1744:
----------------------------------------
Here are the high level points we discussed today regarding the KiteConnector
and Hbase support
Summary: There is already support for HDFS via KiteConnector at this point. We
discussed how to add HIVE/ HBase support using Kite SDK/CLI for sqoop. We also
at length discussed the limitations we have current implementation on handling
failures at the MR engine in sqoop. We think we have a good way to handle
failure scenarios. More details on that below.
Question: Should we have a new connector for HIVE/ HBASE via Kite
Answer: No, we all agreed that using the URI in the FromJobConfig and
ToJobConfig we can figure out the type and then create the relevant dataset
{code}
@Input(size = 255, validators = {@Validator(DatasetURIValidator.class)})
public String uri;
{code}
Question: Should we expose any more From/To job configs related to column
mapping for hbase?
Answer: From Ryan - No, rely on the URI given that it is already setup with
those details, so we wont need to expose any other configs mostly
Question: How do we READS/ FROM case or handle partitioning strategy in
hbase via Kite ?
Answer: From Ryan, piggyback on what is going to be done for HDFS, there is
still a open ticket for this:
https://issues.apache.org/jira/browse/SQOOP-1942, tackle this first
Question: How do we do the WRITES/ TO case for Hbase via Kite?
Answer: Unlike HDFS where we create temporary datasets and them merge them in
the destroyer step, in case of Hbase we will be directly writing the dataset to
the underlying hbase. It is a naive implementation but this is what we ill do
in the phase 1. So in case of TASK/ JOB failures in the MR engine, there will
be partial commits and at this point it will be a limitation. We do not have a
clean solution to handle this at this point.
So we basically make sure that the load step writes datasets and then the
destroy step merging is only run in case of HDFS
Question: Having discussed Hbase, are we sure other connectors implementation
handles TASK failures well?
Answer: We concluded the default retries in the MR engine might lead to
duplicates, so the rows written may be way more than rows read! We expose these
counters today and we may be able to track it.
Abe created a investigation ticket to make sure that dupes will be created
today with MR task failures.
https://issues.apache.org/jira/browse/SQOOP-2000
Question: Should HIVE be done in the same connector as we are doing Hbase
Answer: Yes, everyone agreed, we should not create another connector for this
Question: Long term how are we planning to handle task failures for HDFS/
Hbase ? Hive etc?
I cannot completely write the details here, but a high level the proposal is to
provide hooks for the connectors in Sqoop to control Task level commits. Ryan
is going to create a sqoop ticket with his proposal soon, I will keep nudging
him if he misses it, since [~abec] and Ryan believe it will solve a host of
other scenarios as well
> Kite Connector Support : Read/Write data to HBase via Kite SDK
> --------------------------------------------------------------
>
> Key: SQOOP-1744
> URL: https://issues.apache.org/jira/browse/SQOOP-1744
> Project: Sqoop
> Issue Type: Bug
> Components: connectors
> Reporter: Qian Xu
> Assignee: Veena Basavaraj
> Fix For: 1.99.5
>
>
> Propose to read/ write data into HBase via the Kite SDK hbase module
> http://www.slideshare.net/HBaseCon/ecosystem-session-5
> A detailed design wiki to support basic read/ write and DFM is here
> https://cwiki.apache.org/confluence/display/SQOOP/Kite+Connector+Hbase+support
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)