I assume there's some config parameter not set, because I've installed the
current Hive trunk, installed HBase 0.20.3, and am using Hadoop Core 0.20.2.
I'm still seeing the same behavior (i.e. the HBase table is loaded via Hive,
but only select * works from it, select count(1) and other operations get no
data).

Here's some info on that I see. I'd appreciate any leads you can give me.




DETAILS

There are no errors, because no mappers are invoked.
map <jobtasks.jsp?jobid=job_201009070004_0001&type=map&pagenum=1>100.00%000000
/ 0reduce <jobtasks.jsp?jobid=job_201009070004_0001&type=reduce&pagenum=1>
100.00%000000 / 0



--hadoop-env.sh
cat ./hadoop-env.sh | grep CLASS
HADOOP_CLASSPATH=/hbase/hbase-0.20.3-test.jar:/hbase/hbase-0.20.3.jar:/hbase/lib/zookeeper-3.2.2.jar:$HADOOP_CLASSPATH
HADOOP_CLASSPATH=/hive/lib/:/hive/lib/*jar:/hive/conf/:$HADOOP_CLASSPATH
export HADOOP_CLASSPATH

-- I'm using the Derby hive metastore. Since it creates ./metastore_db/ in
the current working dir, I invoke "hive" with this alias
alias hive='cd ~/; /hive/bin/hive --auxpath
/hive/lib/hive_hbase-handler.jar,/hive/lib/hbase-0.20.3.jar,/hive/lib/zookeeper-3.2.2.jar
-hiveconf hbase.zookeeper.quorum=pos01n,pos02n,sux01n'



select * from hbase_table_1 limit 4;
OK
1802051275 0000b87c1142193304e47e97cf981fc9
1802051477 00209a5ea0e2524b1fccb8cdd9b4836b
1802051645 00100073215fb9b53c8c5e0b1e571cf4
1802051659 00103d6db62b61ab0063a908317e2b43
Time taken: 0.109 seconds


select key from hbase_table_1 limit 5;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201009070004_0001, Tracking URL =
http://pos01n:50030/jobdetails.jsp?jobid=job_201009070004_0001
Kill Command = /hadoop/bin/../bin/hadoop job
 -Dmapred.job.tracker=pos01n:9001 -kill job_201009070004_0001
2010-09-07 00:08:12,711 Stage-1 map = 0%,  reduce = 0%
2010-09-07 00:08:15,733 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_201009070004_0001
OK
Time taken: 8.783 seconds



/************************************************************
STARTUP_MSG: Starting TaskTracker
STARTUP_MSG:   host = pos01n/192.168.36.240
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 0.20.2
STARTUP_MSG:   build =
https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r
911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
************************************************************/
2010-09-07 00:04:21,180 INFO org.mortbay.log: Logging to
org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
org.mortbay.log.Slf4jLog
2010-09-07 00:04:21,282 INFO org.apache.hadoop.http.HttpServer: Port
returned by webServer.getConnectors()[0].getLocalPort() before open() is -1.
Opening the listener on 50060
2010-09-07 00:04:21,288 INFO org.apache.hadoop.http.HttpServer:
listener.getLocalPort() returned 50060
webServer.getConnectors()[0].getLocalPort() returned 50060
2010-09-07 00:04:21,288 INFO org.apache.hadoop.http.HttpServer: Jetty bound
to port 50060
2010-09-07 00:04:21,288 INFO org.mortbay.log: jetty-6.1.14
2010-09-07 00:04:28,541 INFO org.mortbay.log: Started
selectchannelconnec...@0.0.0.0:50060
2010-09-07 00:04:28,653 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=TaskTracker, sessionId=
2010-09-07 00:04:28,667 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
Initializing RPC Metrics with hostName=TaskTracker, port=54194
2010-09-07 00:04:28,706 INFO org.apache.hadoop.ipc.Server: IPC Server
Responder: starting
2010-09-07 00:04:28,708 INFO org.apache.hadoop.ipc.Server: IPC Server
listener on 54194: starting
2010-09-07 00:04:28,707 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 0 on 54194: starting
2010-09-07 00:04:28,709 INFO org.apache.hadoop.mapred.TaskTracker:
TaskTracker up at: localhost.localdomain/127.0.0.1:54194
2010-09-07 00:04:28,709 INFO org.apache.hadoop.mapred.TaskTracker: Starting
tracker tracker_pos01n.tripadvisor.com:localhost.localdomain/127.0.0.1:54194
2010-09-07 00:04:28,711 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 1 on 54194: starting
2010-09-07 00:04:56,363 INFO org.apache.hadoop.mapred.TaskTracker:  Using
MemoryCalculatorPlugin :
org.apache.hadoop.util.linuxmemorycalculatorplu...@30e34726
2010-09-07 00:04:56,372 INFO org.apache.hadoop.mapred.TaskTracker: Starting
thread: Map-events fetcher for all reduce tasks on
tracker_pos01n.tripadvisor.com:localhost.localdomain/127.0.0.1:54194
2010-09-07 00:04:56,375 WARN org.apache.hadoop.mapred.TaskTracker:
TaskTracker's totalMemoryAllottedForTasks is -1. TaskMemoryManager is
disabled.
2010-09-07 00:04:56,376 INFO org.apache.hadoop.mapred.IndexCache: IndexCache
created with max memory = 10485760
2010-09-07 00:07:53,803 INFO org.apache.hadoop.mapred.TaskTracker:
LaunchTaskAction (registerTask): attempt_201009070004_0001_m_000001_0 task's
state:UNASSIGNED
2010-09-07 00:07:53,805 INFO org.apache.hadoop.mapred.TaskTracker: Trying to
launch : attempt_201009070004_0001_m_000001_0
2010-09-07 00:07:53,805 INFO org.apache.hadoop.mapred.TaskTracker: In
TaskLauncher, current free slots : 1 and trying to launch
attempt_201009070004_0001_m_000001_0
2010-09-07 00:07:54,432 WARN org.apache.hadoop.fs.FileSystem: "pos01n:54310"
is a deprecated filesystem name. Use "hdfs://pos01n:54310/" instead.
2010-09-07 00:07:54,468 WARN org.apache.hadoop.fs.FileSystem: "pos01n:54310"
is a deprecated filesystem name. Use "hdfs://pos01n:54310/" instead.
2010-09-07 00:07:54,703 WARN org.apache.hadoop.fs.FileSystem: "pos01n:54310"
is a deprecated filesystem name. Use "hdfs://pos01n:54310/" instead.
2010-09-07 00:07:54,847 WARN org.apache.hadoop.fs.FileSystem: "pos01n:54310"
is a deprecated filesystem name. Use "hdfs://pos01n:54310/" instead.
2010-09-07 00:07:54,946 INFO org.apache.hadoop.mapred.JvmManager: In
JvmRunner constructed JVM ID: jvm_201009070004_0001_m_-1107468038
2010-09-07 00:07:54,946 INFO org.apache.hadoop.mapred.JvmManager: JVM Runner
jvm_201009070004_0001_m_-1107468038 spawned.
2010-09-07 00:07:55,364 INFO org.apache.hadoop.mapred.TaskTracker: JVM with
ID: jvm_201009070004_0001_m_-1107468038 given task:
attempt_201009070004_0001_m_000001_0
2010-09-07 00:07:55,699 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201009070004_0001_m_000001_0 0.0% setup
2010-09-07 00:07:55,701 INFO org.apache.hadoop.mapred.TaskTracker: Task
attempt_201009070004_0001_m_000001_0 is done.
2010-09-07 00:07:55,701 INFO org.apache.hadoop.mapred.TaskTracker: reported
output size for attempt_201009070004_0001_m_000001_0  was 0
2010-09-07 00:07:55,703 INFO org.apache.hadoop.mapred.TaskTracker:
addFreeSlot : current free slots : 1
2010-09-07 00:07:55,885 INFO org.apache.hadoop.mapred.JvmManager: JVM :
jvm_201009070004_0001_m_-1107468038 exited. Number of tasks it ran: 1
2010-09-07 00:07:56,805 INFO org.apache.hadoop.mapred.TaskTracker:
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
taskTracker/jobcache/job_201009070004_0001/attempt_201009070004_0001_m_000001_0/output/file.out
in any of the configured local directories
2010-09-07 00:07:56,834 INFO org.apache.hadoop.mapred.TaskTracker:
LaunchTaskAction (registerTask): attempt_201009070004_0001_m_000000_0 task's
state:UNASSIGNED
2010-09-07 00:07:56,835 INFO org.apache.hadoop.mapred.TaskTracker: Trying to
launch : attempt_201009070004_0001_m_000000_0
2010-09-07 00:07:56,835 INFO org.apache.hadoop.mapred.TaskTracker: In
TaskLauncher, current free slots : 1 and trying to launch
attempt_201009070004_0001_m_000000_0
2010-09-07 00:07:56,835 INFO org.apache.hadoop.mapred.TaskTracker: Received
KillTaskAction for task: attempt_201009070004_0001_m_000001_0
2010-09-07 00:07:56,836 INFO org.apache.hadoop.mapred.TaskTracker: About to
purge task: attempt_201009070004_0001_m_000001_0
2010-09-07 00:07:56,837 INFO org.apache.hadoop.mapred.TaskRunner:
attempt_201009070004_0001_m_000001_0 done; removing files.
2010-09-07 00:07:56,838 INFO org.apache.hadoop.mapred.IndexCache: Map ID
attempt_201009070004_0001_m_000001_0 not found in cache
2010-09-07 00:07:56,865 WARN org.apache.hadoop.fs.FileSystem: "pos01n:54310"
is a deprecated filesystem name. Use "hdfs://pos01n:54310/" instead.
2010-09-07 00:07:56,867 WARN org.apache.hadoop.fs.FileSystem: "pos01n:54310"
is a deprecated filesystem name. Use "hdfs://pos01n:54310/" instead.
2010-09-07 00:07:56,869 WARN org.apache.hadoop.fs.FileSystem: "pos01n:54310"
is a deprecated filesystem name. Use "hdfs://pos01n:54310/" instead.
2010-09-07 00:07:56,871 WARN org.apache.hadoop.fs.FileSystem: "pos01n:54310"
is a deprecated filesystem name. Use "hdfs://pos01n:54310/" instead.
2010-09-07 00:07:56,897 INFO org.apache.hadoop.mapred.JvmManager: In
JvmRunner constructed JVM ID: jvm_201009070004_0001_m_970995359
2010-09-07 00:07:56,897 INFO org.apache.hadoop.mapred.JvmManager: JVM Runner
jvm_201009070004_0001_m_970995359 spawned.
2010-09-07 00:07:57,318 INFO org.apache.hadoop.mapred.TaskTracker: JVM with
ID: jvm_201009070004_0001_m_970995359 given task:
attempt_201009070004_0001_m_000000_0
2010-09-07 00:07:57,647 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201009070004_0001_m_000000_0 0.0%
2010-09-07 00:07:57,650 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201009070004_0001_m_000000_0 0.0% cleanup
2010-09-07 00:07:57,651 INFO org.apache.hadoop.mapred.TaskTracker: Task
attempt_201009070004_0001_m_000000_0 is done.
2010-09-07 00:07:57,651 INFO org.apache.hadoop.mapred.TaskTracker: reported
output size for attempt_201009070004_0001_m_000000_0  was 0
2010-09-07 00:07:57,652 INFO org.apache.hadoop.mapred.TaskTracker:
addFreeSlot : current free slots : 1
2010-09-07 00:07:57,823 INFO org.apache.hadoop.mapred.JvmManager: JVM :
jvm_201009070004_0001_m_970995359 exited. Number of tasks it ran: 1
2010-09-07 00:07:59,837 INFO org.apache.hadoop.mapred.TaskTracker:
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
taskTracker/jobcache/job_201009070004_0001/attempt_201009070004_0001_m_000000_0/output/file.out
in any of the configured local directories
2010-09-07 00:07:59,871 INFO org.apache.hadoop.mapred.TaskTracker: Received
'KillJobAction' for job: job_201009070004_0001
2010-09-07 00:07:59,871 INFO org.apache.hadoop.mapred.TaskRunner:
attempt_201009070004_0001_m_000000_0 done; removing files.
2010-09-07 00:07:59,872 INFO org.apache.hadoop.mapred.IndexCache: Map ID
attempt_201009070004_0001_m_000000_0 not found in cache



-- The sequential write test does populate the table (as seen via hbase
shell)
After running the following:
hadoop org.apache.hadoop.hbase.PerformanceEvaluation sequentialWrite 3
30 mappers ran and showed no errors, producing the following output:
...
10/09/07 00:18:57 INFO mapred.JobClient:  map 100% reduce 28%
10/09/07 00:19:12 INFO mapred.JobClient:  map 100% reduce 30%
10/09/07 00:19:24 INFO mapred.JobClient:  map 100% reduce 100%
10/09/07 00:19:32 INFO mapred.JobClient: Job complete: job_201009070004_0003
10/09/07 00:19:32 INFO mapred.JobClient: Counters: 17
10/09/07 00:19:32 INFO mapred.JobClient:   HBase Performance Evaluation
10/09/07 00:19:32 INFO mapred.JobClient:     Row count=3145710
10/09/07 00:19:32 INFO mapred.JobClient:     Elapsed time in
milliseconds=2277702
10/09/07 00:19:32 INFO mapred.JobClient:   Job Counters
10/09/07 00:19:32 INFO mapred.JobClient:     Launched reduce tasks=1
10/09/07 00:19:32 INFO mapred.JobClient:     Launched map tasks=30
10/09/07 00:19:32 INFO mapred.JobClient:   FileSystemCounters
10/09/07 00:19:32 INFO mapred.JobClient:     FILE_BYTES_READ=546
10/09/07 00:19:32 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=2226
10/09/07 00:19:32 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=414
10/09/07 00:19:32 INFO mapred.JobClient:   Map-Reduce Framework
10/09/07 00:19:32 INFO mapred.JobClient:     Reduce input groups=30
10/09/07 00:19:32 INFO mapred.JobClient:     Combine output records=0
10/09/07 00:19:32 INFO mapred.JobClient:     Map input records=30
10/09/07 00:19:32 INFO mapred.JobClient:     Reduce shuffle bytes=696
10/09/07 00:19:32 INFO mapred.JobClient:     Reduce output records=30
10/09/07 00:19:32 INFO mapred.JobClient:     Spilled Records=60
10/09/07 00:19:32 INFO mapred.JobClient:     Map output bytes=480
10/09/07 00:19:32 INFO mapred.JobClient:     Combine input records=0
10/09/07 00:19:32 INFO mapred.JobClient:     Map output records=30
10/09/07 00:19:32 INFO mapred.JobClient:     Reduce input records=30
10/09/07 00:19:32 INFO zookeeper.ZooKeeper: Closing session:
0x22aea5e6d8d0001
10/09/07 00:19:32 INFO zookeeper.ClientCnxn: Closing ClientCnxn for session:
0x22aea5e6d8d0001
10/09/07 00:19:32 INFO zookeeper.ClientCnxn: Exception while closing send
thread for session 0x22aea5e6d8d0001 : Read error rc = -1
java.nio.DirectByteBuffer[pos=0 lim=4 cap=4]
10/09/07 00:19:32 INFO zookeeper.ClientCnxn: Disconnecting ClientCnxn for
session: 0x22aea5e6d8d0001
10/09/07 00:19:32 INFO zookeeper.ZooKeeper: Session: 0x22aea5e6d8d0001
closed
10/09/07 00:19:32 INFO zookeeper.ClientCnxn: EventThread shut down







hive> CREATE TABLE test
    > AS SELECT * from hbase_table_1
    > ;
Total MapReduce jobs = 2
Launching Job 1 out of 2
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201009062337_0002, Tracking URL =
http://pos01n:50030/jobdetails.jsp?jobid=job_201009062337_0002
Kill Command = /hadoop/bin/../bin/hadoop job
 -Dmapred.job.tracker=pos01n:9001 -kill job_201009062337_0002
2010-09-06 23:42:09,566 Stage-1 map = 0%,  reduce = 0%
2010-09-06 23:42:12,582 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_201009062337_0002
Ended Job = 1321600821, job is filtered out (removed at runtime).
Moving data to:
hdfs://pos01n:54310/data1/hive_scratchdir/hive_2010-09-06_23-42-03_055_2351688256300251976/-ext-10001
Moving data to: /user/hive/warehouse/test
OK
Time taken: 9.694 seconds
hive> select * from hbase_table_1 limit 3;
OK
1802051275 0000b87c1142193304e47e97cf981fc9
1802051477 00209a5ea0e2524b1fccb8cdd9b4836b
1802051645 00100073215fb9b53c8c5e0b1e571cf4
Time taken: 0.111 seconds




On Mon, Sep 6, 2010 at 5:16 PM, John Sichi <jsi...@facebook.com> wrote:

> Hmmm, anything interesting in the task logs?  Seems like somehow the task
> tracker nodes can't see the HBase table whereas the client node can, but I
> would expect then to see an error instead of zero rows.
>
> JVS
>
> On Sep 4, 2010, at 4:36 PM, phil young wrote:
>
> I can confirm the HBase table is populated via "SELECT *" or the hbase
> shell.
> But, when I read or copy the table via a mapreduce job, there are no rows
> returned.
>
> I'm hoping someone would recognize this as some sort of confiuration
> problem.
> The stack is: Hadoop 0.20.2, HBase 0.20.3, and Hive from the trunk ~8/20.
>
> Here are the statements that show the problem...
>
>
>
> hive> select * from hbase_table_1 limit 5;
> OK
> 500184511 033ee0111f22bbf5786f80df3d163834
> 500184512 030c23751e42fa5e01d05daf5a028e8b
> 500184516 01945892c252a55da843c692f4b1bd77
> 500184542 0078d187207d1f1777524b027f826b19
> 500184662 036e9bd88dba12bfc6943f417d29302f
> Time taken: 0.087 seconds
>
>
> hive> select key, value from hbase_table_1 limit 5;
> Total MapReduce jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks is set to 0 since there's no reduce operator
> Starting Job = job_201009041301_0030, Tracking URL =
> http://pos01n:50030/jobdetails.jsp?jobid=job_201009041301_0030
> Kill Command = /hadoop/bin/../bin/hadoop job
>  -Dmapred.job.tracker=pos01n:9001 -kill job_201009041301_0030
> 2010-09-04 19:04:34,673 Stage-1 map = 0%,  reduce = 0%
> 2010-09-04 19:04:37,685 Stage-1 map = 100%,  reduce = 100%
> Ended Job = job_201009041301_0030
> OK
> Time taken: 8.386 seconds
>
>
> hive> describe extended hbase_table_1;
> OK
> key int from deserializer
> value string from deserializer
>
> Detailed Table Information Table(tableName:hbase_table_1, dbName:default,
> owner:root, createTime:1283637617, lastAccessTime:0, retention:0,
> sd:StorageDescriptor(cols:[FieldSchema(name:key, type:int, comment:null),
> FieldSchema(name:value, type:string, comment:null)],
> location:hdfs://pos01n:54310/user/hive/warehouse/hbase_table_1,
> inputFormat:org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat,
> outputFormat:org.apache.hadoop.hive.hbase.HiveHBaseTableOutputFormat,
> compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null,
> serializationLib:org.apache.hadoop.hive.hbase.HBaseSerDe,
> parameters:{serialization.format=1, hbase.columns.mapping=:key,cf1:val}),
> bucketCols:[], sortCols:[], parameters:{}), partitionKeys:[], parameters:{
> hbase.table.name=xyz, transient_lastDdlTime=1283637617,
> storage_handler=org.apache.hadoop.hive.hbase.HBaseStorageHandler},
> viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE)
>
>
> Of course, I appreciate the help. Hopefully I'll find HBase can solve my
> problem, become a user, and be able to return the favor some day ;)
>
>
>
>
>

Reply via email to