[
https://issues.apache.org/jira/browse/HBASE-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13007035#comment-13007035
]
Subbu M Iyer commented on HBASE-2495:
-------------------------------------
===================================================================
Test Setup and validating the RegEx based or Prefix based exports
=================================================================
create 'mytest', 'cf1'
put 'mytest', 'row_11', 'cf1:col1', 'sun'
put 'mytest', 'row_22', 'cf1:col1', 'moon'
put 'mytest', 'row_33', 'cf1:col1', 'mars'
put 'mytest', 'row_44', 'cf1:col1', 'mercury'
put 'mytest', 'row_55', 'cf1:col1', 'jupiter'
put 'mytest', 'row_66', 'cf1:col1', 'venus'
put 'mytest', 'row_77', 'cf1:col1', 'saturn'
put 'mytest', 'row_88', 'cf1:col1', 'raghu'
put 'mytest', 'row_99', 'cf1:col1', 'kethu'
// Rows with prefix
put 'mytest', 'prefix_row_11', 'cf1:col1', 'prefix_sun'
put 'mytest', 'prefix_row_22', 'cf1:col1', 'prefix_moon'
put 'mytest', 'prefix_row_33', 'cf1:col1', 'prefix_mars'
put 'mytest', 'prefix_row_44', 'cf1:col1', 'prefix_mercury'
put 'mytest', 'prefix_row_55', 'cf1:col1', 'prefix_jupiter'
create 'planets', 'cf1'
hbase(main):021:0> scan 'mytest'
ROW COLUMN+CELL
prefix_row_11 column=cf1:col1, timestamp=1300209089377,
value=prefix_sun
prefix_row_22 column=cf1:col1, timestamp=1300209089497,
value=prefix_moon
prefix_row_33 column=cf1:col1, timestamp=1300209089609,
value=prefix_mars
prefix_row_44 column=cf1:col1, timestamp=1300209089786,
value=prefix_mercury
prefix_row_55 column=cf1:col1, timestamp=1300209090453,
value=prefix_jupiter
row_11 column=cf1:col1, timestamp=1300208484184,
value=sun
row_22 column=cf1:col1, timestamp=1300208500801,
value=moon
row_33 column=cf1:col1, timestamp=1300208510790,
value=mars
row_44 column=cf1:col1, timestamp=1300208525242,
value=mercury
row_55 column=cf1:col1, timestamp=1300208643830,
value=jupiter
row_66 column=cf1:col1, timestamp=1300208643961,
value=venus
row_77 column=cf1:col1, timestamp=1300208644080,
value=saturn
row_88 column=cf1:col1, timestamp=1300208644220,
value=raghu
row_99 column=cf1:col1, timestamp=1300208645195,
value=kethu
===================================================================================================
After Exporting all rows based on a RegEx export - RegEx = row = 14 matching
rows
===================================================================================================
java -cp
hbase-0.91.0-SNAPSHOT.jar:./lib/hadoop-0.20.1-core.jar:./lib/commons-logging-1.1.1.jar:./lib/commons-cli-1.2.jar:./lib/zookeeper-3.3.2.jar:./lib/log4j-1.2.16.jar:./lib/commons-httpclient-3.1.jar
org.apache.hadoop.hbase.mapreduce.Export mytest
/work/HBaseExport/planets/regex 1 0 9223372036854775807 ^row
11/03/15 10:12:14 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000000_0'
done.
11/03/15 10:12:15 INFO mapred.JobClient: map 100% reduce 0%
11/03/15 10:12:15 INFO mapred.JobClient: Job complete: job_local_0001
11/03/15 10:12:15 INFO mapred.JobClient: Counters: 4
11/03/15 10:12:15 INFO mapred.JobClient: FileSystemCounters
11/03/15 10:12:15 INFO mapred.JobClient: FILE_BYTES_READ=2436502
11/03/15 10:12:15 INFO mapred.JobClient: FILE_BYTES_WRITTEN=2477330
11/03/15 10:12:15 INFO mapred.JobClient: Map-Reduce Framework
11/03/15 10:12:15 INFO mapred.JobClient: Map input records=14
11/03/15 10:12:15 INFO mapred.JobClient: Spilled Records=0
===================================================================================================
After Exporting all rows based on a row Prefix - Prefix = prefix = 5 matching
rows
===================================================================================================
java -cp
hbase-0.91.0-SNAPSHOT.jar:./lib/hadoop-0.20.1-core.jar:./lib/commons-logging-1.1.1.jar:./lib/commons-cli-1.2.jar:./lib/zookeeper-3.3.2.jar:./lib/log4j-1.2.16.jar:./lib/commons-httpclient-3.1.jar
org.apache.hadoop.hbase.mapreduce.Export mytest
/work/HBaseExport/planets/prefix 1 0 9223372036854775807 prefix
11/03/15 10:14:09 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000000_0'
done.
11/03/15 10:14:10 INFO mapred.JobClient: map 100% reduce 0%
11/03/15 10:14:10 INFO mapred.JobClient: Job complete: job_local_0001
11/03/15 10:14:10 INFO mapred.JobClient: Counters: 4
11/03/15 10:14:10 INFO mapred.JobClient: FileSystemCounters
11/03/15 10:14:10 INFO mapred.JobClient: FILE_BYTES_READ=2436418
11/03/15 10:14:10 INFO mapred.JobClient: FILE_BYTES_WRITTEN=2476581
11/03/15 10:14:10 INFO mapred.JobClient: Map-Reduce Framework
11/03/15 10:14:10 INFO mapred.JobClient: Map input records=5
11/03/15 10:14:10 INFO mapred.JobClient: Spilled Records=0
===================================================================================================
After Importing all rows based on a Prefix based export - Imports 5 rows
===================================================================================================
java -cp
hbase-0.91.0-SNAPSHOT.jar:./lib/hadoop-0.20.1-core.jar:./lib/commons-logging-1.1.1.jar:./lib/commons-cli-1.2.jar:./lib/zookeeper-3.3.2.jar:./lib/log4j-1.2.16.jar:./lib/commons-httpclient-3.1.jar
org.apache.hadoop.hbase.mapreduce.Import planets
/work/HBaseExport/planets/prefix
11/03/15 10:15:15 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000000_0'
done.
11/03/15 10:15:16 INFO mapred.JobClient: map 100% reduce 0%
11/03/15 10:15:16 INFO mapred.JobClient: Job complete: job_local_0001
11/03/15 10:15:16 INFO mapred.JobClient: Counters: 4
11/03/15 10:15:16 INFO mapred.JobClient: FileSystemCounters
11/03/15 10:15:16 INFO mapred.JobClient: FILE_BYTES_READ=2436777
11/03/15 10:15:16 INFO mapred.JobClient: FILE_BYTES_WRITTEN=2475715
11/03/15 10:15:16 INFO mapred.JobClient: Map-Reduce Framework
11/03/15 10:15:16 INFO mapred.JobClient: Map input records=5
11/03/15 10:15:16 INFO mapred.JobClient: Spilled Records=0
hbase(main):022:0> scan 'planets'
ROW COLUMN+CELL
prefix_row_11 column=cf1:col1, timestamp=1300209089377,
value=prefix_sun
prefix_row_22 column=cf1:col1, timestamp=1300209089497,
value=prefix_moon
prefix_row_33 column=cf1:col1, timestamp=1300209089609,
value=prefix_mars
prefix_row_44 column=cf1:col1, timestamp=1300209089786,
value=prefix_mercury
prefix_row_55 column=cf1:col1, timestamp=1300209090453,
value=prefix_jupiter
5 row(s) in 0.0720 seconds
===================================================================================================
After Importing all rows based on a RegEx export - Imports 14 rows
===================================================================================================
java -cp
hbase-0.91.0-SNAPSHOT.jar:./lib/hadoop-0.20.1-core.jar:./lib/commons-logging-1.1.1.jar:./lib/commons-cli-1.2.jar:./lib/zookeeper-3.3.2.jar:./lib/log4j-1.2.16.jar:./lib/commons-httpclient-3.1.jar
org.apache.hadoop.hbase.mapreduce.Import planets
/work/HBaseExport/planets/regex
11/03/15 10:16:48 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000000_0'
done.
11/03/15 10:16:49 INFO mapred.JobClient: map 100% reduce 0%
11/03/15 10:16:49 INFO mapred.JobClient: Job complete: job_local_0001
11/03/15 10:16:49 INFO mapred.JobClient: Counters: 4
11/03/15 10:16:49 INFO mapred.JobClient: FileSystemCounters
11/03/15 10:16:49 INFO mapred.JobClient: FILE_BYTES_READ=2437357
11/03/15 10:16:49 INFO mapred.JobClient: FILE_BYTES_WRITTEN=2475711
11/03/15 10:16:49 INFO mapred.JobClient: Map-Reduce Framework
11/03/15 10:16:49 INFO mapred.JobClient: Map input records=14
11/03/15 10:16:49 INFO mapred.JobClient: Spilled Records=0
hbase(main):023:0> scan 'planets'
ROW COLUMN+CELL
prefix_row_11 column=cf1:col1, timestamp=1300209089377,
value=prefix_sun
prefix_row_22 column=cf1:col1, timestamp=1300209089497,
value=prefix_moon
prefix_row_33 column=cf1:col1, timestamp=1300209089609,
value=prefix_mars
prefix_row_44 column=cf1:col1, timestamp=1300209089786,
value=prefix_mercury
prefix_row_55 column=cf1:col1, timestamp=1300209090453,
value=prefix_jupiter
row_11 column=cf1:col1, timestamp=1300208484184,
value=sun
row_22 column=cf1:col1, timestamp=1300208500801,
value=moon
row_33 column=cf1:col1, timestamp=1300208510790,
value=mars
row_44 column=cf1:col1, timestamp=1300208525242,
value=mercury
row_55 column=cf1:col1, timestamp=1300208643830,
value=jupiter
row_66 column=cf1:col1, timestamp=1300208643961,
value=venus
row_77 column=cf1:col1, timestamp=1300208644080,
value=saturn
row_88 column=cf1:col1, timestamp=1300208644220,
value=raghu
row_99 column=cf1:col1, timestamp=1300208645195,
value=kethu
14 row(s) in 0.1570 seconds
===================================================================================================
> Allow record filtering with selected row key values in HBase Export
> -------------------------------------------------------------------
>
> Key: HBASE-2495
> URL: https://issues.apache.org/jira/browse/HBASE-2495
> Project: HBase
> Issue Type: Improvement
> Components: util
> Affects Versions: 0.20.3
> Reporter: Ted Yu
> Labels: moved_from_0_20_5
> Fix For: 0.92.0
>
> Attachments:
> HBASE-2495_-_Allow_record_filtering_with_selected_row_key_values_in_HBase_Export.patch
>
>
> It is desirable to add record filtering capability to HBase Export.
> The following code is an example (s is the Scan):
> byte [] prefix = Bytes.toBytes(args[5]);
> if (args[5].startsWith("^"))
> {
> s.setFilter(new RowFilter(CompareOp.EQUAL, new
> RegexStringComparator(args[5])));
> }
> else s.setFilter(new PrefixFilter(prefix));
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira