I can give try and give it a go. I'm not convinced though as we are working 
with CSV files and don't touch sequence files at all at the moment.

We are using the Clodera Ubuntu Packages for Hadoop 0.20.1+133 and Hive 0.40


On 25 Jan 2010, at 15:30, Jay Booth wrote:

> Actually, we had an issue with this, it was a bug in SequenceFile where if 
> there were problems opening a file, it would leave a filehandle open and 
> never close it.
> 
> Here's the patch -- It's already fixed in 0.21/trunk, if I get some time this 
> week I'll submit it against 0.20.2 -- could you apply this to hadoop and let 
> me know if it fixes things for you?
> 
> On Mon, Jan 25, 2010 at 10:11 AM, Jay Booth 
> <[email protected]<mailto:[email protected]>> wrote:
> Yeah, I'd guess that this is a Hive issue, although it could be a 
> combination..  maybe if you're doing queries and then closing your thrift 
> connection before reading all results, Hive doesn't know what to do and 
> leaves the connection open?  Once the west coast folks wake up, they might 
> have a better answer for you than I do.
> 
> 
> On Mon, Jan 25, 2010 at 9:06 AM, Andy Kent 
> <[email protected]<mailto:[email protected]>> wrote:
> On 25 Jan 2010, at 13:59, Jay Booth wrote:
> 
>> That's the datanode port..  if I had to guess, Hive's connecting to DFS 
>> directly for some reason (maybe for "select *" queries?) and not finishing 
>> their reads or closing the connections after.
> 
> 
> Thanks for the response.
> 
> That's what I was suspecting. I have triple checked and our Ruby code and it 
> is defiantly closing it's thrift connections properly.
> 
> I'll try running some different queries and see if I can suss out some 
> examples of which ones are leaky. Is this something that I should post to 
> Jira or is it a known issue? I can't believe other people haven't noticed 
> this?
> 
> 
> <SequenceFile.patch>

Reply via email to