[
https://issues.apache.org/jira/browse/SQOOP-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14231872#comment-14231872
]
Ryan Blue commented on SQOOP-1744:
----------------------------------
[~vinothchandar], there are a lot of trade-offs to get your data into Parquet
format when records might be changed. Can you put any bounds on what records
might change? If so, we have a lot more options. For example, if only records
created in the last 5 minutes might receive updates, then we can keep those in
an HBase table and copy 5-minute windows from it once we know that the records
aren't going to change.
bq. we can only convert the whole data set in HBase to Parquet everytime, as I
understand
Actually, we can select a subset of the records in HBase and copy them to
Parquet. One big concern is having enough data, though. We generally want to
avoid small Parquet files.
> TO-side: Write data to HBase
> ----------------------------
>
> Key: SQOOP-1744
> URL: https://issues.apache.org/jira/browse/SQOOP-1744
> Project: Sqoop
> Issue Type: Sub-task
> Components: connectors
> Reporter: Qian Xu
> Assignee: Qian Xu
> Fix For: 1.99.5
>
>
> Propose to write data into HBase. Note that different to HDFS, HBase is
> append only. Merge does not work for HBase.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)