[jira] [Commented] (SQOOP-1744) TO-side: Write data to HBase

Vinoth Chandar (JIRA) Tue, 02 Dec 2014 10:57:18 -0800

    [ 
https://issues.apache.org/jira/browse/SQOOP-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14231912#comment-14231912
 ]


Vinoth Chandar commented on SQOOP-1744:
---------------------------------------

[~rdblue] 
>> Can you put any bounds on what records might change?
We have our own usage patterns. But, I don't think we can expect only records 
in the last 5 minutes to change, even for typical OLTP workloads, right (eg: 
Uber table, profile table, etc).. 

>> Actually, we can select a subset of the records in HBase and copy them to 
>> Parquet
Not sure I explained myself clearly... let me take another shot.. 

Once we do a full fetch, we could do something to like below, for the 
subsequent incremental fetch :
(Assume : We did a select * from users; and  produced a number parquet files 
that contain records from a User table, rows organized by the table pk userid)

1) Obtain all rows that changes since last run.
2) Write those rows into HBase to merge. 
3) Then pull them out again & rewrite the affected parquet files. 

But, in this step 2 does not buy us anything, right? Since we stiil need to do 
the work of identifying the affected parquet files and overwrite only those 
affected. Thats why I was saying, only if you convert the whole dataset from 
HFile to parquet, you get an out-of-the-box solution.. 

May be I am missing something? 


> TO-side: Write data to HBase
> ----------------------------
>
>                 Key: SQOOP-1744
>                 URL: https://issues.apache.org/jira/browse/SQOOP-1744
>             Project: Sqoop
>          Issue Type: Sub-task
>          Components: connectors
>            Reporter: Qian Xu
>            Assignee: Qian Xu
>             Fix For: 1.99.5
>
>
> Propose to write data into HBase. Note that different to HDFS, HBase is 
> append only. Merge does not work for HBase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (SQOOP-1744) TO-side: Write data to HBase

Reply via email to