I can give try and give it a go. I'm not convinced though as we are working with CSV files and don't touch sequence files at all at the moment.
We are using the Clodera Ubuntu Packages for Hadoop 0.20.1+133 and Hive 0.40 On 25 Jan 2010, at 15:30, Jay Booth wrote: > Actually, we had an issue with this, it was a bug in SequenceFile where if > there were problems opening a file, it would leave a filehandle open and > never close it. > > Here's the patch -- It's already fixed in 0.21/trunk, if I get some time this > week I'll submit it against 0.20.2 -- could you apply this to hadoop and let > me know if it fixes things for you? > > On Mon, Jan 25, 2010 at 10:11 AM, Jay Booth > <[email protected]<mailto:[email protected]>> wrote: > Yeah, I'd guess that this is a Hive issue, although it could be a > combination.. maybe if you're doing queries and then closing your thrift > connection before reading all results, Hive doesn't know what to do and > leaves the connection open? Once the west coast folks wake up, they might > have a better answer for you than I do. > > > On Mon, Jan 25, 2010 at 9:06 AM, Andy Kent > <[email protected]<mailto:[email protected]>> wrote: > On 25 Jan 2010, at 13:59, Jay Booth wrote: > >> That's the datanode port.. if I had to guess, Hive's connecting to DFS >> directly for some reason (maybe for "select *" queries?) and not finishing >> their reads or closing the connections after. > > > Thanks for the response. > > That's what I was suspecting. I have triple checked and our Ruby code and it > is defiantly closing it's thrift connections properly. > > I'll try running some different queries and see if I can suss out some > examples of which ones are leaky. Is this something that I should post to > Jira or is it a known issue? I can't believe other people haven't noticed > this? > > > <SequenceFile.patch>
