Hi Dave, There may be 2 issues here: 1) according to the HDFS JIRA it seems to be a bug introduced in Hadoop 0.18 and was fixed by another hadoop JIRA in 0.20.2 suggested by Hairong, you may want to try 0.20.2 if that's possible. 2) the fact that the lease expired seems to be caused by the fact that the open file has not get any write for a long time. From you query the TRANSFORM script is called for each row that will going to be written to HDFS file. Does the python script cause long latency (the time between two output records from the transform script)?
On Oct 1, 2010, at 8:34 AM, Dave Brondsema wrote: We're trying to insert into a table, using a dynamic partition, but the query runs for a while and then dies with a LeaseExpiredException. The hadoop details & some discussion is at https://issues.apache.org/jira/browse/HDFS-198 Is there a way to configure hive, or our query, to work around this? If we adjust our query to handle less data at once, it can complete in under 10 minutes, but then we have to run the query many more times to get all the data processed. The query is: FROM ( FROM ( SELECT file, os, country, dt, project FROM downloads WHERE dt='2010-10-01' DISTRIBUTE BY project SORT BY project asc, file asc ) a SELECT TRANSFORM(file, os, country, dt, project) USING 'transformwrap reduce.py' AS (file, downloads, os, country, project) ) b INSERT OVERWRITE TABLE dl_day PARTITION (dt='2010-10-01', project) SELECT file, downloads, os, country, FALSE, project The project partition has roughly 100000 values. We're using Hive trunk from about a month ago. Hadoop 0.18.3-14.cloudera.CH0_3 -- Dave Brondsema Software Engineer Geeknet www.geek.net<http://www.geek.net/>