Did this help? I'm running into a similar problem. slowly leaking
connections to 50010 and after a hive restart all is ok again.
Andy Kent wrote:
I can give try and give it a go. I'm not convinced though as we are working
with CSV files and don't touch sequence files at all at the moment.
We are using the Clodera Ubuntu Packages for Hadoop 0.20.1+133 and Hive 0.40
On 25 Jan 2010, at 15:30, Jay Booth wrote:
Actually, we had an issue with this, it was a bug in SequenceFile where if
there were problems opening a file, it would leave a filehandle open and never
close it.
Here's the patch -- It's already fixed in 0.21/trunk, if I get some time this
week I'll submit it against 0.20.2 -- could you apply this to hadoop and let me
know if it fixes things for you?
On Mon, Jan 25, 2010 at 10:11 AM, Jay Booth
<[email protected]<mailto:[email protected]>> wrote:
Yeah, I'd guess that this is a Hive issue, although it could be a combination..
maybe if you're doing queries and then closing your thrift connection before
reading all results, Hive doesn't know what to do and leaves the connection
open? Once the west coast folks wake up, they might have a better answer for
you than I do.
On Mon, Jan 25, 2010 at 9:06 AM, Andy Kent
<[email protected]<mailto:[email protected]>> wrote:
On 25 Jan 2010, at 13:59, Jay Booth wrote:
That's the datanode port.. if I had to guess, Hive's connecting to DFS directly for some
reason (maybe for "select *" queries?) and not finishing their reads or closing
the connections after.
Thanks for the response.
That's what I was suspecting. I have triple checked and our Ruby code and it is
defiantly closing it's thrift connections properly.
I'll try running some different queries and see if I can suss out some examples
of which ones are leaky. Is this something that I should post to Jira or is it
a known issue? I can't believe other people haven't noticed this?
<SequenceFile.patch>