This is a known issue in Hive Server. This is because the same metastore client 
is being used to issue both queries and JDBC does not like that. We should use 
thread specific or session specific metastore clients but I don't think Hive 
Server is doing that right now. HIVE-584 is supposed to fix this issue.

________________________________
From: Matt Pestritto <[email protected]>
Reply-To: <[email protected]>
Date: Tue, 28 Jul 2009 10:48:24 -0700
To: <[email protected]>
Subject: Problem with Thrift Server Concurrency

Hi all

Does the Thrift server support concurrency ?  I'm having a problem that only
happens if I fire off multiple ( 2+ ) DML queries at the same time.
Randomly, one of the queries will succeed but the other will fail with the
following error I pulled from the hiveserver output:

java.io.IOException: cannot find dir =
hdfs://mustique:9000/user/hadoop/mantis-output/mantis-job/20090601 in
partToPartitionInfo!
    at
org.apache.hadoop.hive.ql.io.HiveInputFormat.getTableDescFromPath(HiveInputFormat.java:311)
    at
org.apache.hadoop.hive.ql.io.HiveInputFormat.validateInput(HiveInputFormat.java:288)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:735)
    at
org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:388)
    at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:357)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:263)
    at
org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.execute(HiveServer.java:108)
    at
org.apache.hadoop.hive.service.ThriftHive$Processor$execute.process(ThriftHive.java:302)
    at
org.apache.hadoop.hive.service.ThriftHive$Processor.process(ThriftHive.java:290)
    at
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:252)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:619)

If I execute the queries via thrift a few seconds apart from each other, it
succeeds.  It only seems to fail if the queries start at about the same
time.

When I run the same two queries using *hive -e "query 1" & hive -e "query 2"
* is also works fine.

Any ideas ?

Thanks
-Matt

Reply via email to