Morris Swertz
Thu, 19 Nov 2009 08:20:49 -0800
Hi all,I try to load data from HBase into pig with HBaseStorage. Something is going wrong because no data from HBase (test table) shows up in Pig; only errors.
I configured the Hadoop and HBase in Pseudo-Distributed Operation mode. What follows are the commands that I did and the output it produced. //try with pig in remote mode! pig -x mapreduceB = load 'hbase://test' using org.apache.pig.backend.hadoop.hbase.HBaseStorage('data') as (col_a);
dump B; output:009-11-19 13:56:02,810 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2009-11-19 13:56:02,810 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2009-11-19 13:56:04,708 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2009-11-19 13:56:04,729 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2009-11-19 13:56:04,739 [Thread-5] WARN org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
2009-11-19 13:56:05,024 [Thread-5] INFO org.apache.pig.backend.hadoop.hbase.HBaseStorage - tablename: file:/Users/jorislops/Desktop/pig-0.5.0/test
2009-11-19 13:56:05,231 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2009-11-19 13:56:06,222 [Thread-5] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: localhost/127.0.0.1:60000 <http://127.0.0.1:60000>. Already tried 0 time(s).
2009-11-19 13:56:06,222 [Thread-5] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: localhost/127.0.0.1:60000 <http://127.0.0.1:60000>. Already tried 1 time(s).
//port 60000 is used by a java program pig -x localB = load 'test' using org.apache.pig.backend.hadoop.hbase.HBaseStorage('data') as (col_a);
dump B; output:2009-11-19 13:53:18,425 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Successfully stored result in: "file:/tmp/temp-1663248768/tmp-1939618752"
2009-11-19 13:53:18,436 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Records written : 0
2009-11-19 13:53:18,436 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Bytes written : 0
2009-11-19 13:53:18,436 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete!
2009-11-19 13:53:18,436 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
//there is nothing in /tmp/temp-1663248768/tmp-1939618752 (it's empty)I tried different paths to the HBase table 'hbase://test', 'test', hbase://localhost:60000/test
How I stated the system (Hadoop + HBase) is started and I verified that's working as I expected.
bin/hadoop namenode -format bin/start-all.sh//both Namenode and Jobtrackter are running verified by http://localhost:50070 and http://localhost:500040
bin/start-hbase.sh//both mater and regionserver are running check by localhost:60010 localhost:20 localhost:30
//also zookeeper Quorum is started at port localhost:2181 //fill a test table in hbase hbase-0.20.1/bin/hbase shell create 'test', 'data' put 'test', 'row1', 'data', 'value1' scan 'test' //localhost:60010 show that the test table is in HBase. Hope that someone knows the solution. Thanks, Joris