[
https://issues.apache.org/jira/browse/HBASE-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13461184#comment-13461184
]
Alexander Alten-Lorenz commented on HBASE-6694:
-----------------------------------------------
Confirmed that the patch is working. Job.xml contains:
I created over a whole day CF's:
hbase shell> for r in 1 .. 10 do for c in 1 .. 100000000 do put 'test1',
"row-#{r}", "cf1:c#{c}", "1" end end
===========
With -Dhbase.export.scanner.batch=100:
HBASE_CLASSPATH="/usr/lib/hadoop-0.20-mapreduce/hadoop-core-2.0.0-mr1-cdh4.0.1.jar"
bin/hbase org.apache.hadoop.hbase.mapreduce.Export
-Dhbase.export.scanner.batch=100 test1 /home/hdfs/test2.export
---
{code}
12/09/22 01:17:23 DEBUG mapreduce.TableInputFormatBase: getSplits: split -> 0
-> hadoop4.internal:,
12/09/22 01:17:24 WARN conf.Configuration: fs.default.name is deprecated.
Instead, use fs.defaultFS
12/09/22 01:17:24 WARN conf.Configuration: io.bytes.per.checksum is deprecated.
Instead, use dfs.bytes-per-checksum
12/09/22 01:17:25 INFO mapred.JobClient: Running job: job_201209212254_0010
12/09/22 01:17:26 INFO mapred.JobClient: map 0% reduce 0%
12/09/22 01:17:59 INFO mapred.JobClient: map 100% reduce 0%
12/09/22 01:18:02 INFO mapred.JobClient: Job complete: job_201209212254_0010
12/09/22 01:18:03 INFO mapred.JobClient: Counters: 24
12/09/22 01:18:03 INFO mapred.JobClient: File System Counters
12/09/22 01:18:03 INFO mapred.JobClient: FILE: Number of bytes read=0
12/09/22 01:18:03 INFO mapred.JobClient: FILE: Number of bytes written=84332
12/09/22 01:18:03 INFO mapred.JobClient: FILE: Number of read operations=0
12/09/22 01:18:03 INFO mapred.JobClient: FILE: Number of large read
operations=0
12/09/22 01:18:03 INFO mapred.JobClient: FILE: Number of write operations=0
12/09/22 01:18:03 INFO mapred.JobClient: HDFS: Number of bytes read=70
12/09/22 01:18:03 INFO mapred.JobClient: HDFS: Number of bytes
written=62070728
12/09/22 01:18:03 INFO mapred.JobClient: HDFS: Number of read operations=1
12/09/22 01:18:03 INFO mapred.JobClient: HDFS: Number of large read
operations=0
12/09/22 01:18:03 INFO mapred.JobClient: HDFS: Number of write operations=1
12/09/22 01:18:03 INFO mapred.JobClient: Job Counters
12/09/22 01:18:03 INFO mapred.JobClient: Launched map tasks=1
12/09/22 01:18:03 INFO mapred.JobClient: Data-local map tasks=1
12/09/22 01:18:03 INFO mapred.JobClient: Total time spent by all maps in
occupied slots (ms)=35760
12/09/22 01:18:03 INFO mapred.JobClient: Total time spent by all reduces in
occupied slots (ms)=0
12/09/22 01:18:03 INFO mapred.JobClient: Total time spent by all maps
waiting after reserving slots (ms)=0
12/09/22 01:18:03 INFO mapred.JobClient: Total time spent by all reduces
waiting after reserving slots (ms)=0
12/09/22 01:18:03 INFO mapred.JobClient: Map-Reduce Framework
12/09/22 01:18:03 INFO mapred.JobClient: Map input records=15258
12/09/22 01:18:03 INFO mapred.JobClient: Map output records=15258
12/09/22 01:18:03 INFO mapred.JobClient: Input split bytes=70
12/09/22 01:18:03 INFO mapred.JobClient: Spilled Records=0
12/09/22 01:18:03 INFO mapred.JobClient: CPU time spent (ms)=5970
12/09/22 01:18:03 INFO mapred.JobClient: Physical memory (bytes)
snapshot=106557440
12/09/22 01:18:03 INFO mapred.JobClient: Virtual memory (bytes)
snapshot=570249216
12/09/22 01:18:03 INFO mapred.JobClient: Total committed heap usage
(bytes)=42663936
{code}
Export readable in hdfs.
==========
Without -D switch:
RS timed out:
{code}
2012-09-22 01:27:27,937 DEBUG org.apache.hadoop.hbase.master.SplitLogManager:
task not yet acquired
/hbase/splitlog/hdfs%3A%2F%2Fhadoop4%3A8020%2Fhbase%2F.logs%2Fhadoop4.internal%2C60020%2C1348269190284-splitting%2Fhadoop4.internal%252C60020%252C1348269190284.1348269200784
ver = 0
2012-09-22 01:27:28,938 DEBUG org.apache.hadoop.hbase.master.SplitLogManager:
total tasks = 1 unassigned = 1
2012-09-22 01:27:28,938 DEBUG org.apache.hadoop.hbase.master.SplitLogManager:
resubmitting unassigned task(s) after timeout
2012-09-22 01:27:29,237 DEBUG org.apache.hadoop.hbase.master.SplitLogManager:
task not yet acquired
/hbase/splitlog/hdfs%3A%2F%2Fhadoop4%3A8020%2Fhbase%2F.logs%2Fhadoop4.internal%2C60020%2C1348269190284-splitting%2Fhadoop4.internal%252C60020%252C1348269190284.1348269200784
ver = 0
2012-09-22 01:27:29,239 INFO org.apache.hadoop.hbase.master.SplitLogManager:
task /hbase/splitlog/RESCAN0000000057 entered state done
hadoop4.internal,60000,1348269184131
2012-09-22 01:27:29,239 INFO org.apache.hadoop.hbase.master.SplitLogManager:
task /hbase/splitlog/RESCAN0000000058 entered state done
hadoop4.internal,60000,1348269184131
2012-09-22 01:27:29,282 DEBUG
org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback: deleted
/hbase/splitlog/RESCAN0000000057
2012-09-22 01:27:29,282 DEBUG org.apache.hadoop.hbase.master.SplitLogManager:
deleted task without in memory state /hbase/splitlog/RESCAN0000000057
2012-09-22 01:27:29,283 DEBUG
org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback: deleted
/hbase/splitlog/RESCAN0000000058
2012-09-22 01:27:29,283 DEBUG org.apache.hadoop.hbase.master.SplitLogManager:
deleted task without in memory state /hbase/splitlog/RESCAN0000000058
{code}
====
Test environment:
Virtual machine, 2GB RAM, 512MB exported Heap for Hbase, Hadoop cluster mode,
HBase pseudo distributed. I would say, it worked.
> Test scanner batching in export job feature HBASE-6372 AND report on
> improvement HBASE-6372 adds
> ------------------------------------------------------------------------------------------------
>
> Key: HBASE-6694
> URL: https://issues.apache.org/jira/browse/HBASE-6694
> Project: HBase
> Issue Type: Task
> Reporter: stack
> Assignee: Alexander Alten-Lorenz
> Attachments: HBASE-6694.patch
>
>
> From tail of HBASE-6372, Jon had raised issue that test added did not
> actually test the feature. This issue is about adding a test of HBASE-6372.
> We should also have numbers for the improvement that HBASE-6372 brings.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira