[
https://issues.apache.org/jira/browse/HBASE-26451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17446094#comment-17446094
]
Peter Somogyi commented on HBASE-26451:
---------------------------------------
Hi [~xdosis],
Import will add all the HFiles to HBase with the timestamps it was originally
inserted with. HBase will _"merge"_ all the HFiles when you run a request and
gets you the _latest_ data. Since you had the same HFiles twice the only
difference was the delete thumbstone for a single rowkey for which the
timestamp was the latest so HBase did not give you back that row.
It could look something like this:
1. Original state in table:
||rowkey||column family||timestamp||value||
|r1|f1|1|a|
|r2|f1|2|b|
|r3|f1|3|c|
2. Run export
3. Delete r2
||rowkey||column family||timestamp||value||
|r1|f1|1|a|
|r2|f1|2|b|
|r3|f1|3|c|
|r2|f1|4|<delete thumbstone>|
4. Run import
||rowkey||column family||timestamp||value||
|r1|f1|1|a|
|r2|f1|2|b|
|r3|f1|3|c|
|r2|f1|4|<delete thumbstone>|
|r1|f1|1|a|
|r2|f1|2|b|
|r3|f1|3|c|
>From this if you run a scan in the table HBase will give you back only the
>following:
||rowkey||column family||timestamp||value||
|r1|f1|1|a|
|r3|f1|3|c|
> Hbase Export/Import via MapReduce job
> -------------------------------------
>
> Key: HBASE-26451
> URL: https://issues.apache.org/jira/browse/HBASE-26451
> Project: HBase
> Issue Type: Bug
> Components: backup&restore
> Affects Versions: 2.0.1
> Reporter: Christos Dosis
> Priority: Major
>
> Hi Hbase support team,
>
> While using the MapReduce job with export/import commands we have the below
> behaviour.
>
> {+}Step1{+}: I have a hbase table(tsdb) with 3 rows. and export the table
> like below:
> /opt/hbase/bin/hbase org.apache.hadoop.hbase.mapreduce.Export tsdb
> file:///opt/hbase/backup/tsdb
> {+}Step2{+}: Then I delete 1 row.
> {+}Step3{+}: Then I import tsdb table from the exported data from Step 1.
> /opt/hbase/bin/hbase org.apache.hadoop.hbase.mapreduce.Import tsdb
> file:///opt/hbase/backup/tsdb
>
> There are still only 2 rows in the table. Is this a valid behaviour?
>
> Br,
> Chris
--
This message was sent by Atlassian Jira
(v8.20.1#820001)