Re: Approach: Incremental data load from HBASE

Chetan Khatri Sat, 24 Dec 2016 19:24:48 -0800

Hello HBase Community,

What is suggested approach for Incremental import from HBase to HDFS, like
RDBMS to HDFS Sqoop provides support with below script


sqoop job --create myssb1 -- import --connect
jdbc:mysql://<hostname>:<port>/sakila --username admin --password admin
--driver=com.mysql.jdbc.Driver --query "SELECT address_id, address,
district, city_id, postal_code, alast_update, cityid, city, country_id,
clast_update FROM(SELECT a.address_id as address_id, a.address as address,
a.district as district, a.city_id as city_id, a.postal_code as postal_code,
a.last_update as alast_update, c.city_id as cityid, c.city as city,
c.country_id as country_id, c.last_update as clast_update FROM
sakila.address a INNER JOIN sakila.city c ON a.city_id=c.city_id) as sub
WHERE $CONDITIONS" --incremental lastmodified --check-column alast_update
--last-value 1900-01-01 --target-dir /user/cloudera/ssb7 --hive-import
--hive-table test.sakila -m 1 --hive-drop-import-delims --map-column-java
address=String


Thanks.

On Wed, Dec 21, 2016 at 3:58 PM, Chetan Khatri <[email protected]>
wrote:

> Hello Guys,
>
> I would like to understand different approach for Distributed Incremental
> load from HBase, Is there any *tool / incubactor tool* which satisfy
> requirement ?
>
> *Approach 1:*
>
> Write Kafka Producer and maintain manually column flag for events and
> ingest it with Linkedin Gobblin to HDFS / S3.
>
> *Approach 2:*
>
> Run Scheduled Spark Job - Read from HBase and do transformations and
> maintain flag column at HBase Level.
>
> In above both approach, I need to maintain column level flags. such as 0 -
> by default, 1-sent,2-sent and acknowledged. So next time Producer will take
> another 1000 rows of batch where flag is 0 or 1.
>
> I am looking for best practice approach with any distributed tool.
>
> Thanks.
>
> - Chetan Khatri
>

Re: Approach: Incremental data load from HBASE

Reply via email to