[ 
https://issues.apache.org/jira/browse/SQOOP-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14231872#comment-14231872
 ] 

Ryan Blue commented on SQOOP-1744:
----------------------------------

[~vinothchandar], there are a lot of trade-offs to get your data into Parquet 
format when records might be changed. Can you put any bounds on what records 
might change? If so, we have a lot more options. For example, if only records 
created in the last 5 minutes might receive updates, then we can keep those in 
an HBase table and copy 5-minute windows from it once we know that the records 
aren't going to change.

bq. we can only convert the whole data set in HBase to Parquet everytime, as I 
understand

Actually, we can select a subset of the records in HBase and copy them to 
Parquet. One big concern is having enough data, though. We generally want to 
avoid small Parquet files.

> TO-side: Write data to HBase
> ----------------------------
>
>                 Key: SQOOP-1744
>                 URL: https://issues.apache.org/jira/browse/SQOOP-1744
>             Project: Sqoop
>          Issue Type: Sub-task
>          Components: connectors
>            Reporter: Qian Xu
>            Assignee: Qian Xu
>             Fix For: 1.99.5
>
>
> Propose to write data into HBase. Note that different to HDFS, HBase is 
> append only. Merge does not work for HBase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to