Hey thrift & hcat gurus - We've also noticed this OOM issue when processing corrupt thrift messages. We're attempting to work around this issue as follows (see https://github.com/kevinweil/elephant-bird/pull/239/files#L5R45):
@Override public void deserialize(TBase base, byte[] bytes) throws TException { // set upper bound on bytes available so that protocol does not try // to allocate and read large amounts of data in case of corrupt input protocol.setReadLength(bytes.length); super.deserialize(base, bytes); } Would it make sense to setReadLength directly in TDeserializer.deserialize? https://github.com/apache/thrift/blob/trunk/lib/java/src/org/apache/thrift/TDeserializer.java#L60 --travis On Wed, Aug 29, 2012 at 4:00 AM, Joshi, Rekha <rekha_jo...@intuit.com> wrote: > Thanks for confirming Agateaa. > > Since the Hcat server behaves normally , and you observed the issue in your > log just once, it does drop in a concern for me at the moment. > Also not sure if it is CMS related/environment related behavior.At some point > of time I might try to replicate your system, and update you if I face this > too. > > However cc-ing to thrift dev mailing list as well, as there are some known > libthrift/TBinaryProtocol issue inline with yours - > https://issues.apache.org/jira/browse/THRIFT-1643 > > Thanks > Rekha > > From: agateaaa <agate...@gmail.com<mailto:agate...@gmail.com>> > Reply-To: > <hcatalog-user@incubator.apache.org<mailto:hcatalog-user@incubator.apache.org>> > Date: Tue, 28 Aug 2012 07:50:00 -0700 > To: > <hcatalog-user@incubator.apache.org<mailto:hcatalog-user@incubator.apache.org>> > Subject: Re: HCatalog Thrift Error > > Hi Rekha > > Yes the hcatalog server was up and and still running. I can query tables via > pig scripts and also run hive queries. As a matter of fact its still > running. > > Before I applied a patch for THRIFT-1468 I had seen my server crash > frequently under similar circumstances (OutOfMemory). Since the patch > havent seen any crashes (just that error once) > > I did take java heap dump just after I saw the error and did not see any > increase in the heap size. I read in GC tuning docs that if > full gc is taking longer (taking 98% of time), JVM may throw that OutOfMemory > error - but I am not really sure (I am using CMS so I am not sure if that > applies) > > I can check if I get same error as THRIFT-1205 > > Isnt HIVE-2715 same as fixing THRIFT-1468 (atleast for in terms of its > resolution)? > > Thanks > A > > > > > > On Tue, Aug 28, 2012 at 2:33 AM, Joshi, Rekha > <rekha_jo...@intuit.com<mailto:rekha_jo...@intuit.com>> wrote: > Hi Agateaa, > > Impressive bug description. > > Can you confirm HCat server was up (inspite of thread dump/GC) and for all > practical purposes commands were getting executed in a normal fashion for > fairly good time after the GC issues were noticed on log? > Unless there is a self-healing effect built-in :-) /timeout after which the > error is automatically invalid/system is reset/space is reclaimed, there must > be a way it would have directly impact the system, and not just known because > one checks the log. > > I do not have the same patched environment as yours, but would you care to > unpatch Thrift-1468 and then check if your system bug behavior is in sync > with - > https://issues.apache.org/jira/browse/THRIFT-1205 > https://issues.apache.org/jira/browse/THRIFT-1468 > https://issues.apache.org/jira/browse/HIVE-2715 > > Or especially since you did not enter arbitrary data, can you confirm you get > usual if you do enter provide arbitrary data? > > Thanks > Rekha > > From: agateaaa <agate...@gmail.com<mailto:agate...@gmail.com>> > Reply-To: > <hcatalog-user@incubator.apache.org<mailto:hcatalog-user@incubator.apache.org>> > Date: Mon, 27 Aug 2012 10:38:01 -0700 > To: > <hcatalog-user@incubator.apache.org<mailto:hcatalog-user@incubator.apache.org>> > Subject: Re: HCatalog Thrift Error > > Correction: > > I have a fairly small server (VM) 1GB RAM and 1 CPU and using HCatalog > Version 0.4, Hive 0.9 (patched for HIVE-3008) with Thrift 0.7 (patched for > THRIFT-1468) > > > On Mon, Aug 27, 2012 at 10:27 AM, agateaaa > <agate...@gmail.com<mailto:agate...@gmail.com>> wrote: > Hi, > > I got this error over the weekend hcat.err log file. > > Noticed at the approximately same time Full GC was happening in the gc logs. > > Exception in thread "pool-1-thread-200" java.lang.OutOfMemoryError: Java heap > space > at > org.apache.thrift.protocol.TBinaryProtocol.readStringBody(TBinaryProtocol.java:353) > at > org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:215) > at > org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:81) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:176) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > Exception in thread "pool-1-thread-201" java.lang.OutOfMemoryError: Java heap > space > at > org.apache.thrift.protocol.TBinaryProtocol.readStringBody(TBinaryProtocol.java:353) > at > org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:215) > at > org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:81) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:176) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > Exception in thread "pool-1-thread-202" java.lang.OutOfMemoryError: Java heap > space > at > org.apache.thrift.protocol.TBinaryProtocol.readStringBody(TBinaryProtocol.java:353) > at > org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:215) > at > org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:81) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:176) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > Exception in thread "pool-1-thread-203" java.lang.OutOfMemoryError: Java heap > space > at > org.apache.thrift.protocol.TBinaryProtocol.readStringBody(TBinaryProtocol.java:353) > at > org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:215) > at > org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:81) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:176) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > > > I noticed that the hcatalog server had not shutdown, don't see any other > abnormality in the logs > > > Searching led me to these two thrift issues > https://issues.apache.org/jira/browse/THRIFT-601 > https://issues.apache.org/jira/browse/THRIFT-1205 > > Only difference is that in my case HCatalog server did not crash and I wasn't > trying to send > any arbritary data to the thrift server at the telnet port > > I have a fairly small server (VM) 1GB RAM and 1 CPU and using HCatalog > Version 0.4, Hive 0.9 (patched HIVE-3008) with Thrift 0.7 (patched for > THRIFT-1438) > > Has anyone seen this before ? > > Thanks > - A > > >