[ 
https://issues.apache.org/jira/browse/SQOOP-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14272111#comment-14272111
 ] 

Veena Basavaraj commented on SQOOP-1744:
----------------------------------------

Here are the high level points we discussed today regarding the KiteConnector 
and Hbase support

Summary: There is already support for HDFS via KiteConnector at this point. We 
discussed how to add HIVE/ HBase support using Kite SDK/CLI for sqoop. We also 
at length discussed the limitations we have current implementation on handling 
failures at the MR engine in sqoop. We think we have a good way to handle 
failure scenarios. More details on that below.


Question: Should we have a new connector for HIVE/ HBASE via Kite
Answer: No, we all agreed that using the URI in the FromJobConfig and 
ToJobConfig we can figure out the type and then create the relevant dataset 
{code}
  @Input(size = 255, validators = {@Validator(DatasetURIValidator.class)})
  public String uri;
{code}

​Question: Should we expose any more From/To job configs related to column 
mapping ​for hbase?
Answer: From Ryan - No, rely on the URI given that it is already setup with 
those details, so we wont need to expose any other configs mostly


​Question: How do we  READS/ FROM case or handle partitioning ​strategy in 
hbase via Kite ? 
Answer: From Ryan, piggyback on what is going to be done for HDFS, there is 
still a open ticket for this:  
https://issues.apache.org/jira/browse/SQOOP-1942, tackle this first

​Question: How do we do the WRITES/ TO case for Hbase via Kite?
Answer: Unlike HDFS where we create temporary datasets and them merge them in 
the destroyer step, in case of Hbase we will be directly writing the dataset to 
the underlying hbase. It is a naive implementation but this is what we ill do 
in the phase 1. So in case of TASK/ JOB failures in the MR engine, there will 
be partial commits and at this point it will be a limitation. We do not have a 
clean solution to handle this at this point.
So we basically make sure that the load step writes datasets and then the 
destroy step merging is only run in case of HDFS

​Question:  Having discussed Hbase, are we sure other connectors implementation 
handles TASK failures well? 
Answer: We concluded the default retries in the MR engine might lead to 
duplicates, so the rows written may be way more than rows read! We expose these 
counters today and we may be able to track it.

Abe created a  investigation ticket to make sure that dupes will be created 
today with MR task failures.
https://issues.apache.org/jira/browse/SQOOP-2000

​Question: Should HIVE be done in the same connector as we are doing Hbase
Answer: Yes, everyone agreed, we should not create another connector for this


​Question: Long term how are we planning to handle task failures for HDFS/ 
Hbase ? Hive etc?
I cannot completely write the details here, but a high level the proposal is to 
provide hooks for the connectors in Sqoop to control Task level commits.  Ryan 
is going to create a sqoop ticket with his proposal soon, I will keep nudging 
him if he misses it, since [~abec] and Ryan believe it will solve a host of 
other scenarios as well


> Kite Connector Support : Read/Write data to HBase via Kite SDK
> --------------------------------------------------------------
>
>                 Key: SQOOP-1744
>                 URL: https://issues.apache.org/jira/browse/SQOOP-1744
>             Project: Sqoop
>          Issue Type: Bug
>          Components: connectors
>            Reporter: Qian Xu
>            Assignee: Veena Basavaraj
>             Fix For: 1.99.5
>
>
> Propose to read/ write data into HBase via the Kite SDK hbase module
> http://www.slideshare.net/HBaseCon/ecosystem-session-5
> A detailed design wiki to support basic read/ write and DFM is here
> https://cwiki.apache.org/confluence/display/SQOOP/Kite+Connector+Hbase+support



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to