I assume there's some config parameter not set, because I've installed the current Hive trunk, installed HBase 0.20.3, and am using Hadoop Core 0.20.2. I'm still seeing the same behavior (i.e. the HBase table is loaded via Hive, but only select * works from it, select count(1) and other operations get no data).
Here's some info on that I see. I'd appreciate any leads you can give me. DETAILS There are no errors, because no mappers are invoked. map <jobtasks.jsp?jobid=job_201009070004_0001&type=map&pagenum=1>100.00%000000 / 0reduce <jobtasks.jsp?jobid=job_201009070004_0001&type=reduce&pagenum=1> 100.00%000000 / 0 --hadoop-env.sh cat ./hadoop-env.sh | grep CLASS HADOOP_CLASSPATH=/hbase/hbase-0.20.3-test.jar:/hbase/hbase-0.20.3.jar:/hbase/lib/zookeeper-3.2.2.jar:$HADOOP_CLASSPATH HADOOP_CLASSPATH=/hive/lib/:/hive/lib/*jar:/hive/conf/:$HADOOP_CLASSPATH export HADOOP_CLASSPATH -- I'm using the Derby hive metastore. Since it creates ./metastore_db/ in the current working dir, I invoke "hive" with this alias alias hive='cd ~/; /hive/bin/hive --auxpath /hive/lib/hive_hbase-handler.jar,/hive/lib/hbase-0.20.3.jar,/hive/lib/zookeeper-3.2.2.jar -hiveconf hbase.zookeeper.quorum=pos01n,pos02n,sux01n' select * from hbase_table_1 limit 4; OK 1802051275 0000b87c1142193304e47e97cf981fc9 1802051477 00209a5ea0e2524b1fccb8cdd9b4836b 1802051645 00100073215fb9b53c8c5e0b1e571cf4 1802051659 00103d6db62b61ab0063a908317e2b43 Time taken: 0.109 seconds select key from hbase_table_1 limit 5; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_201009070004_0001, Tracking URL = http://pos01n:50030/jobdetails.jsp?jobid=job_201009070004_0001 Kill Command = /hadoop/bin/../bin/hadoop job -Dmapred.job.tracker=pos01n:9001 -kill job_201009070004_0001 2010-09-07 00:08:12,711 Stage-1 map = 0%, reduce = 0% 2010-09-07 00:08:15,733 Stage-1 map = 100%, reduce = 100% Ended Job = job_201009070004_0001 OK Time taken: 8.783 seconds /************************************************************ STARTUP_MSG: Starting TaskTracker STARTUP_MSG: host = pos01n/192.168.36.240 STARTUP_MSG: args = [] STARTUP_MSG: version = 0.20.2 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010 ************************************************************/ 2010-09-07 00:04:21,180 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog 2010-09-07 00:04:21,282 INFO org.apache.hadoop.http.HttpServer: Port returned by webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening the listener on 50060 2010-09-07 00:04:21,288 INFO org.apache.hadoop.http.HttpServer: listener.getLocalPort() returned 50060 webServer.getConnectors()[0].getLocalPort() returned 50060 2010-09-07 00:04:21,288 INFO org.apache.hadoop.http.HttpServer: Jetty bound to port 50060 2010-09-07 00:04:21,288 INFO org.mortbay.log: jetty-6.1.14 2010-09-07 00:04:28,541 INFO org.mortbay.log: Started selectchannelconnec...@0.0.0.0:50060 2010-09-07 00:04:28,653 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=TaskTracker, sessionId= 2010-09-07 00:04:28,667 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC Metrics with hostName=TaskTracker, port=54194 2010-09-07 00:04:28,706 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting 2010-09-07 00:04:28,708 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 54194: starting 2010-09-07 00:04:28,707 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 54194: starting 2010-09-07 00:04:28,709 INFO org.apache.hadoop.mapred.TaskTracker: TaskTracker up at: localhost.localdomain/127.0.0.1:54194 2010-09-07 00:04:28,709 INFO org.apache.hadoop.mapred.TaskTracker: Starting tracker tracker_pos01n.tripadvisor.com:localhost.localdomain/127.0.0.1:54194 2010-09-07 00:04:28,711 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 54194: starting 2010-09-07 00:04:56,363 INFO org.apache.hadoop.mapred.TaskTracker: Using MemoryCalculatorPlugin : org.apache.hadoop.util.linuxmemorycalculatorplu...@30e34726 2010-09-07 00:04:56,372 INFO org.apache.hadoop.mapred.TaskTracker: Starting thread: Map-events fetcher for all reduce tasks on tracker_pos01n.tripadvisor.com:localhost.localdomain/127.0.0.1:54194 2010-09-07 00:04:56,375 WARN org.apache.hadoop.mapred.TaskTracker: TaskTracker's totalMemoryAllottedForTasks is -1. TaskMemoryManager is disabled. 2010-09-07 00:04:56,376 INFO org.apache.hadoop.mapred.IndexCache: IndexCache created with max memory = 10485760 2010-09-07 00:07:53,803 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask): attempt_201009070004_0001_m_000001_0 task's state:UNASSIGNED 2010-09-07 00:07:53,805 INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch : attempt_201009070004_0001_m_000001_0 2010-09-07 00:07:53,805 INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current free slots : 1 and trying to launch attempt_201009070004_0001_m_000001_0 2010-09-07 00:07:54,432 WARN org.apache.hadoop.fs.FileSystem: "pos01n:54310" is a deprecated filesystem name. Use "hdfs://pos01n:54310/" instead. 2010-09-07 00:07:54,468 WARN org.apache.hadoop.fs.FileSystem: "pos01n:54310" is a deprecated filesystem name. Use "hdfs://pos01n:54310/" instead. 2010-09-07 00:07:54,703 WARN org.apache.hadoop.fs.FileSystem: "pos01n:54310" is a deprecated filesystem name. Use "hdfs://pos01n:54310/" instead. 2010-09-07 00:07:54,847 WARN org.apache.hadoop.fs.FileSystem: "pos01n:54310" is a deprecated filesystem name. Use "hdfs://pos01n:54310/" instead. 2010-09-07 00:07:54,946 INFO org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID: jvm_201009070004_0001_m_-1107468038 2010-09-07 00:07:54,946 INFO org.apache.hadoop.mapred.JvmManager: JVM Runner jvm_201009070004_0001_m_-1107468038 spawned. 2010-09-07 00:07:55,364 INFO org.apache.hadoop.mapred.TaskTracker: JVM with ID: jvm_201009070004_0001_m_-1107468038 given task: attempt_201009070004_0001_m_000001_0 2010-09-07 00:07:55,699 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201009070004_0001_m_000001_0 0.0% setup 2010-09-07 00:07:55,701 INFO org.apache.hadoop.mapred.TaskTracker: Task attempt_201009070004_0001_m_000001_0 is done. 2010-09-07 00:07:55,701 INFO org.apache.hadoop.mapred.TaskTracker: reported output size for attempt_201009070004_0001_m_000001_0 was 0 2010-09-07 00:07:55,703 INFO org.apache.hadoop.mapred.TaskTracker: addFreeSlot : current free slots : 1 2010-09-07 00:07:55,885 INFO org.apache.hadoop.mapred.JvmManager: JVM : jvm_201009070004_0001_m_-1107468038 exited. Number of tasks it ran: 1 2010-09-07 00:07:56,805 INFO org.apache.hadoop.mapred.TaskTracker: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/jobcache/job_201009070004_0001/attempt_201009070004_0001_m_000001_0/output/file.out in any of the configured local directories 2010-09-07 00:07:56,834 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask): attempt_201009070004_0001_m_000000_0 task's state:UNASSIGNED 2010-09-07 00:07:56,835 INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch : attempt_201009070004_0001_m_000000_0 2010-09-07 00:07:56,835 INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current free slots : 1 and trying to launch attempt_201009070004_0001_m_000000_0 2010-09-07 00:07:56,835 INFO org.apache.hadoop.mapred.TaskTracker: Received KillTaskAction for task: attempt_201009070004_0001_m_000001_0 2010-09-07 00:07:56,836 INFO org.apache.hadoop.mapred.TaskTracker: About to purge task: attempt_201009070004_0001_m_000001_0 2010-09-07 00:07:56,837 INFO org.apache.hadoop.mapred.TaskRunner: attempt_201009070004_0001_m_000001_0 done; removing files. 2010-09-07 00:07:56,838 INFO org.apache.hadoop.mapred.IndexCache: Map ID attempt_201009070004_0001_m_000001_0 not found in cache 2010-09-07 00:07:56,865 WARN org.apache.hadoop.fs.FileSystem: "pos01n:54310" is a deprecated filesystem name. Use "hdfs://pos01n:54310/" instead. 2010-09-07 00:07:56,867 WARN org.apache.hadoop.fs.FileSystem: "pos01n:54310" is a deprecated filesystem name. Use "hdfs://pos01n:54310/" instead. 2010-09-07 00:07:56,869 WARN org.apache.hadoop.fs.FileSystem: "pos01n:54310" is a deprecated filesystem name. Use "hdfs://pos01n:54310/" instead. 2010-09-07 00:07:56,871 WARN org.apache.hadoop.fs.FileSystem: "pos01n:54310" is a deprecated filesystem name. Use "hdfs://pos01n:54310/" instead. 2010-09-07 00:07:56,897 INFO org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID: jvm_201009070004_0001_m_970995359 2010-09-07 00:07:56,897 INFO org.apache.hadoop.mapred.JvmManager: JVM Runner jvm_201009070004_0001_m_970995359 spawned. 2010-09-07 00:07:57,318 INFO org.apache.hadoop.mapred.TaskTracker: JVM with ID: jvm_201009070004_0001_m_970995359 given task: attempt_201009070004_0001_m_000000_0 2010-09-07 00:07:57,647 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201009070004_0001_m_000000_0 0.0% 2010-09-07 00:07:57,650 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201009070004_0001_m_000000_0 0.0% cleanup 2010-09-07 00:07:57,651 INFO org.apache.hadoop.mapred.TaskTracker: Task attempt_201009070004_0001_m_000000_0 is done. 2010-09-07 00:07:57,651 INFO org.apache.hadoop.mapred.TaskTracker: reported output size for attempt_201009070004_0001_m_000000_0 was 0 2010-09-07 00:07:57,652 INFO org.apache.hadoop.mapred.TaskTracker: addFreeSlot : current free slots : 1 2010-09-07 00:07:57,823 INFO org.apache.hadoop.mapred.JvmManager: JVM : jvm_201009070004_0001_m_970995359 exited. Number of tasks it ran: 1 2010-09-07 00:07:59,837 INFO org.apache.hadoop.mapred.TaskTracker: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/jobcache/job_201009070004_0001/attempt_201009070004_0001_m_000000_0/output/file.out in any of the configured local directories 2010-09-07 00:07:59,871 INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction' for job: job_201009070004_0001 2010-09-07 00:07:59,871 INFO org.apache.hadoop.mapred.TaskRunner: attempt_201009070004_0001_m_000000_0 done; removing files. 2010-09-07 00:07:59,872 INFO org.apache.hadoop.mapred.IndexCache: Map ID attempt_201009070004_0001_m_000000_0 not found in cache -- The sequential write test does populate the table (as seen via hbase shell) After running the following: hadoop org.apache.hadoop.hbase.PerformanceEvaluation sequentialWrite 3 30 mappers ran and showed no errors, producing the following output: ... 10/09/07 00:18:57 INFO mapred.JobClient: map 100% reduce 28% 10/09/07 00:19:12 INFO mapred.JobClient: map 100% reduce 30% 10/09/07 00:19:24 INFO mapred.JobClient: map 100% reduce 100% 10/09/07 00:19:32 INFO mapred.JobClient: Job complete: job_201009070004_0003 10/09/07 00:19:32 INFO mapred.JobClient: Counters: 17 10/09/07 00:19:32 INFO mapred.JobClient: HBase Performance Evaluation 10/09/07 00:19:32 INFO mapred.JobClient: Row count=3145710 10/09/07 00:19:32 INFO mapred.JobClient: Elapsed time in milliseconds=2277702 10/09/07 00:19:32 INFO mapred.JobClient: Job Counters 10/09/07 00:19:32 INFO mapred.JobClient: Launched reduce tasks=1 10/09/07 00:19:32 INFO mapred.JobClient: Launched map tasks=30 10/09/07 00:19:32 INFO mapred.JobClient: FileSystemCounters 10/09/07 00:19:32 INFO mapred.JobClient: FILE_BYTES_READ=546 10/09/07 00:19:32 INFO mapred.JobClient: FILE_BYTES_WRITTEN=2226 10/09/07 00:19:32 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=414 10/09/07 00:19:32 INFO mapred.JobClient: Map-Reduce Framework 10/09/07 00:19:32 INFO mapred.JobClient: Reduce input groups=30 10/09/07 00:19:32 INFO mapred.JobClient: Combine output records=0 10/09/07 00:19:32 INFO mapred.JobClient: Map input records=30 10/09/07 00:19:32 INFO mapred.JobClient: Reduce shuffle bytes=696 10/09/07 00:19:32 INFO mapred.JobClient: Reduce output records=30 10/09/07 00:19:32 INFO mapred.JobClient: Spilled Records=60 10/09/07 00:19:32 INFO mapred.JobClient: Map output bytes=480 10/09/07 00:19:32 INFO mapred.JobClient: Combine input records=0 10/09/07 00:19:32 INFO mapred.JobClient: Map output records=30 10/09/07 00:19:32 INFO mapred.JobClient: Reduce input records=30 10/09/07 00:19:32 INFO zookeeper.ZooKeeper: Closing session: 0x22aea5e6d8d0001 10/09/07 00:19:32 INFO zookeeper.ClientCnxn: Closing ClientCnxn for session: 0x22aea5e6d8d0001 10/09/07 00:19:32 INFO zookeeper.ClientCnxn: Exception while closing send thread for session 0x22aea5e6d8d0001 : Read error rc = -1 java.nio.DirectByteBuffer[pos=0 lim=4 cap=4] 10/09/07 00:19:32 INFO zookeeper.ClientCnxn: Disconnecting ClientCnxn for session: 0x22aea5e6d8d0001 10/09/07 00:19:32 INFO zookeeper.ZooKeeper: Session: 0x22aea5e6d8d0001 closed 10/09/07 00:19:32 INFO zookeeper.ClientCnxn: EventThread shut down hive> CREATE TABLE test > AS SELECT * from hbase_table_1 > ; Total MapReduce jobs = 2 Launching Job 1 out of 2 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_201009062337_0002, Tracking URL = http://pos01n:50030/jobdetails.jsp?jobid=job_201009062337_0002 Kill Command = /hadoop/bin/../bin/hadoop job -Dmapred.job.tracker=pos01n:9001 -kill job_201009062337_0002 2010-09-06 23:42:09,566 Stage-1 map = 0%, reduce = 0% 2010-09-06 23:42:12,582 Stage-1 map = 100%, reduce = 100% Ended Job = job_201009062337_0002 Ended Job = 1321600821, job is filtered out (removed at runtime). Moving data to: hdfs://pos01n:54310/data1/hive_scratchdir/hive_2010-09-06_23-42-03_055_2351688256300251976/-ext-10001 Moving data to: /user/hive/warehouse/test OK Time taken: 9.694 seconds hive> select * from hbase_table_1 limit 3; OK 1802051275 0000b87c1142193304e47e97cf981fc9 1802051477 00209a5ea0e2524b1fccb8cdd9b4836b 1802051645 00100073215fb9b53c8c5e0b1e571cf4 Time taken: 0.111 seconds On Mon, Sep 6, 2010 at 5:16 PM, John Sichi <jsi...@facebook.com> wrote: > Hmmm, anything interesting in the task logs? Seems like somehow the task > tracker nodes can't see the HBase table whereas the client node can, but I > would expect then to see an error instead of zero rows. > > JVS > > On Sep 4, 2010, at 4:36 PM, phil young wrote: > > I can confirm the HBase table is populated via "SELECT *" or the hbase > shell. > But, when I read or copy the table via a mapreduce job, there are no rows > returned. > > I'm hoping someone would recognize this as some sort of confiuration > problem. > The stack is: Hadoop 0.20.2, HBase 0.20.3, and Hive from the trunk ~8/20. > > Here are the statements that show the problem... > > > > hive> select * from hbase_table_1 limit 5; > OK > 500184511 033ee0111f22bbf5786f80df3d163834 > 500184512 030c23751e42fa5e01d05daf5a028e8b > 500184516 01945892c252a55da843c692f4b1bd77 > 500184542 0078d187207d1f1777524b027f826b19 > 500184662 036e9bd88dba12bfc6943f417d29302f > Time taken: 0.087 seconds > > > hive> select key, value from hbase_table_1 limit 5; > Total MapReduce jobs = 1 > Launching Job 1 out of 1 > Number of reduce tasks is set to 0 since there's no reduce operator > Starting Job = job_201009041301_0030, Tracking URL = > http://pos01n:50030/jobdetails.jsp?jobid=job_201009041301_0030 > Kill Command = /hadoop/bin/../bin/hadoop job > -Dmapred.job.tracker=pos01n:9001 -kill job_201009041301_0030 > 2010-09-04 19:04:34,673 Stage-1 map = 0%, reduce = 0% > 2010-09-04 19:04:37,685 Stage-1 map = 100%, reduce = 100% > Ended Job = job_201009041301_0030 > OK > Time taken: 8.386 seconds > > > hive> describe extended hbase_table_1; > OK > key int from deserializer > value string from deserializer > > Detailed Table Information Table(tableName:hbase_table_1, dbName:default, > owner:root, createTime:1283637617, lastAccessTime:0, retention:0, > sd:StorageDescriptor(cols:[FieldSchema(name:key, type:int, comment:null), > FieldSchema(name:value, type:string, comment:null)], > location:hdfs://pos01n:54310/user/hive/warehouse/hbase_table_1, > inputFormat:org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat, > outputFormat:org.apache.hadoop.hive.hbase.HiveHBaseTableOutputFormat, > compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, > serializationLib:org.apache.hadoop.hive.hbase.HBaseSerDe, > parameters:{serialization.format=1, hbase.columns.mapping=:key,cf1:val}), > bucketCols:[], sortCols:[], parameters:{}), partitionKeys:[], parameters:{ > hbase.table.name=xyz, transient_lastDdlTime=1283637617, > storage_handler=org.apache.hadoop.hive.hbase.HBaseStorageHandler}, > viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE) > > > Of course, I appreciate the help. Hopefully I'll find HBase can solve my > problem, become a user, and be able to return the favor some day ;) > > > > >