As per discussions -  3 Gigs and 50 mins timeout is the upper limit for us
from Travis's side.
I have tried to run the different Test cases grouped in parallel (such that
each thread gets its own 50 mins time) but that did not help us.
We will have to fit in these boundaries if we are planning to go ahead with
Travis.

Yash


On Fri, Jun 27, 2014 at 11:55 PM, Yash Sharma <[email protected]> wrote:

> I am currently trying to find all such test-cases which are specifically
> stopping our build. Here are 3 of these:
> All are getting OOM after given time-
>
> TestExampleQueries 10.0 mins.
> TestTpchSingleMode 50.0 minutes.
> TestReverseImplicitCast 50.0 minutes.
>
> Travis currently allows 3 gigs for entire build.
> If we are exceeding this then we probably should also see how we can
> reduce the memory footprint. Building the code in Jenkins/Travis skipping
> the testcases is also not very convincing.
>
> I am on a chat with Travis guys on a different window.. Is there anything
> you would like me to discuss in specific?
>
> Thoughts?
>
> Yash
>
>
>
>
> On Fri, Jun 27, 2014 at 11:27 PM, Jacques Nadeau <[email protected]>
> wrote:
>
>> sounds like the travis instance doesn't have enough memory to run our
>> tests.
>>
>>
>> On Fri, Jun 27, 2014 at 6:08 AM, Yash Sharma <[email protected]> wrote:
>>
>> > *Single forked *surefire run doesn't help. Neither does *unbounded*
>> > *timeout* limit.
>> >
>> > There is some part of test that is on infinite wait state and ends with
>> > OOM.
>> > Here is the full log for Travis build with no max memory limit on jvm:
>> >
>> > https://api.travis-ci.org/jobs/28583768/log.txt?deansi=true
>> >
>> >
>> >
>> >
>> > > Exception in thread "0d42f23c-c1a8-417a-bee5-672d4449ebac:frag:0:0 -
>> > Producer Thread" java.lang.OutOfMemoryError: Direct buffer memory
>> > >       at java.nio.Bits.reserveMemory(Bits.java:658)
>> > >       at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123)
>> > >       at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306)
>> > >       at
>> > io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:434)
>> > >       at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:179)
>> > >       at io.netty.buffer.PoolArena.allocate(PoolArena.java:168)
>> > >       at io.netty.buffer.PoolArena.allocate(PoolArena.java:98)
>> > >       at
>> >
>> io.netty.buffer.PooledByteBufAllocatorL.newDirectBuffer(PooledByteBufAllocatorL.java:46)
>> > >       at
>> >
>> io.netty.buffer.PooledByteBufAllocatorL.directBuffer(PooledByteBufAllocatorL.java:66)
>> > >       at
>> >
>> org.apache.drill.exec.memory.TopLevelAllocator$ChildAllocator.buffer(TopLevelAllocator.java:144)
>> > >       at
>> >
>> org.apache.drill.exec.memory.TopLevelAllocator$ChildAllocator.buffer(TopLevelAllocator.java:151)
>> > >       at
>> >
>> org.apache.drill.exec.vector.VarCharVector.allocateNew(VarCharVector.java:306)
>> > >       at
>> >
>> org.apache.drill.exec.vector.NullableVarCharVector.allocateNew(NullableVarCharVector.java:158)
>> > >       at
>> >
>> org.apache.drill.exec.vector.AllocationHelper.allocate(AllocationHelper.java:31)
>> > >       at
>> >
>> org.apache.drill.exec.physical.impl.ScanBatch$Mutator.allocate(ScanBatch.java:281)
>> > >       at
>> > org.apache.drill.exec.physical.impl.ScanBatch.next(ScanBatch.java:137)
>> > >       at
>> >
>> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:112)
>> > >       at
>> >
>> org.apache.drill.exec.physical.impl.producer.ProducerConsumerBatch$Producer.run(ProducerConsumerBatch.java:122)
>> > >       at java.lang.Thread.run(Thread.java:744)
>> > >
>> > >
>> > > No output has been received in the last 10 minutes, this potentially
>> > indicates a stalled build or something wrong with the build itself.
>> > >
>> > >
>> >
>> >
>> >
>> > On Fri, Jun 27, 2014 at 10:24 AM, Jacques Nadeau <[email protected]>
>> > wrote:
>> >
>> > > I believe you're getting killed for excessive memory consumption.
>> > Dropping
>> > > to a single surefire should help
>> > > On Jun 26, 2014 8:43 PM, "Yash Sharma" <[email protected]> wrote:
>> > >
>> > > > The build still failed on Travis after increasing timeout to
>> 200000ms.
>> > > Need
>> > > > to find appropriate value for it.
>> > > > It fails with this error - which typically comes in timeout case:
>> > > >
>> > > >
>> > > > Failed to execute goal
>> > > > org.apache.maven.plugins:maven-surefire-plugin:2.17:test
>> > > > (default-test) on project drill-java-exec: ExecutionException:
>> > > > java.lang.RuntimeException: The forked VM terminated without
>> properly
>> > > > saying goodbye. VM crash or System.exit called?
>> > > > [ERROR] Command was /bin/sh -c cd
>> > > > /home/travis/build/yssharma/incubator-drill/exec/java-exec &&
>> > > > /usr/lib/jvm/java-7-oracle/jre/bin/java -Xms512m -Xmx2g
>> > > > -Ddrill.exec.http.enabled=false
>> > > > -Ddrill.exec.sys.store.provider.local.write=false
>> -XX:MaxPermSize=256M
>> > > > -XX:MaxDirectMemorySize=2096M -XX:+CMSClassUnloadingEnabled -jar
>> > > >
>> > > >
>> > >
>> >
>> /home/travis/build/yssharma/incubator-drill/exec/java-exec/target/surefire/surefirebooter5087846151760741500.jar
>> > > >
>> > > >
>> > > > Will keep digging.
>> > > >
>> > > >
>> > > > @Jacques: I will re-submit the command line configurable patch soon.
>> > > > Will also dig into the surefire forks you mentioned.
>> > > >
>> > > >
>> > > >
>> > > > Yash
>> > > >
>> > > >
>> > > >
>> > > >
>> > > > On Fri, Jun 27, 2014 at 12:39 AM, Yash Sharma <[email protected]>
>> > wrote:
>> > > >
>> > > > > Its was not failing because of the git plugin - rather its how
>> Travis
>> > > > > takes the clone.
>> > > > > Travis uses git clone --depth=50 for fast building.
>> > > > >
>> > > > > I am able to take a neat build with help of Travis team member
>> Hiro.
>> > > The
>> > > > > current build is still going on here on my box. Will share the
>> status
>> > > on
>> > > > > completion.
>> > > > >
>> > > > > Have added JIRA for the same, will add a patch soon:
>> > > > > https://issues.apache.org/jira/browse/DRILL-1083
>> > > > >
>> > > > > Peace,
>> > > > > Yash
>> > > > >
>> > > > >
>> > > > >
>> > > > > On Wed, Jun 25, 2014 at 10:00 AM, Jacques Nadeau <
>> [email protected]
>> > >
>> > > > > wrote:
>> > > > >
>> > > > >> Yeah, we have to disable test on the apache hardware as our tests
>> > our
>> > > > too
>> > > > >> hungry.  I'm try to get some alternatives to work. If someone
>> wanted
>> > > to
>> > > > >> try
>> > > > >> to figure out if we could run on Travis with fork 1, that would
>> be
>> > > > great.
>> > > > >> Right now its failing because of the got plugin.  You can try
>> Travis
>> > > on
>> > > > >> your local fork to try to find a config that works
>> > > > >> On Jun 24, 2014 8:04 PM, "Yash Sharma" <[email protected]>
>> wrote:
>> > > > >>
>> > > > >> > The final build #57 with skipping tests was successful.
>> > > > >> >
>> > > > >> > Majority of the tests #55 and #56 have failed due to TimedOut
>> > > > exception.
>> > > > >> > Other exceptions being - IllegalState(Child level allocators
>> not
>> > > > >> closed).
>> > > > >> > One instance of InterruptedException which probably occurred
>> > because
>> > > > of
>> > > > >> the
>> > > > >> > test case termination only.
>> > > > >> >
>> > > > >> >
>> > > > >> > On Wed, Jun 25, 2014 at 2:59 AM, Timothy Chen <
>> [email protected]>
>> > > > >> wrote:
>> > > > >> >
>> > > > >> > > Looks like lots of tests timed out and errored?
>> > > > >> > >
>> > > > >> > > Tim
>> > > > >> > >
>> > > > >> > > On Tue, Jun 24, 2014 at 11:53 AM, Yash Sharma <
>> > [email protected]>
>> > > > >> wrote:
>> > > > >> > > > *fingers-crossed* :)
>> > > > >> > > >
>> > > > >> > > >
>> > > > >> > > > On Wed, Jun 25, 2014 at 12:19 AM, Jacques Nadeau <
>> > > > >> [email protected]>
>> > > > >> > > wrote:
>> > > > >> > > >
>> > > > >> > > >> I kicked off another build with clean install.  Good
>> catch.
>> > > > >>  Hopefully
>> > > > >> > > that
>> > > > >> > > >> will put things back on track.
>> > > > >> > > >>
>> > > > >> > > >>
>> > > > >> > > >> On Tue, Jun 24, 2014 at 11:46 AM, Yash Sharma <
>> > > [email protected]
>> > > > >
>> > > > >> > > wrote:
>> > > > >> > > >>
>> > > > >> > > >> > Not exactly able to reproduce the same error currently
>> but
>> > I
>> > > > see
>> > > > >> > that
>> > > > >> > > it
>> > > > >> > > >> > was related to the Drill-1024 commit where the
>> hive-storage
>> > > > code
>> > > > >> was
>> > > > >> > > >> moved
>> > > > >> > > >> > out of java-exec. The *drillOI* definition has moved
>> from
>> > > > >> > config.fmpp
>> > > > >> > > >> > (java-exec) to config.fmpp (hive-exec).
>> > > > >> > > >> >
>> > > > >> > > >> > Jenkins build was still failing in java-exec - that
>> means
>> > > that
>> > > > >> the
>> > > > >> > old
>> > > > >> > > >> > ObjectInspectorHelper class was still present and it was
>> > > > probably
>> > > > >> > > looking
>> > > > >> > > >> > for the tdd definition in config.fmpp(java-exec).
>> > > > >> > > >> >
>> > > > >> > > >> > Jenkins used 'mvn install' rather than 'mvn clean
>> install'
>> > -
>> > > > >> maybe
>> > > > >> > it
>> > > > >> > > was
>> > > > >> > > >> > still referring to old ObjectInspectorHelper class.
>> > > > >> > > >> >
>> > > > >> > > >> > Still not sure. Will try reproducing exact error.
>> > > > >> > > >> >
>> > > > >> > > >> > Yash
>> > > > >> > > >> >
>> > > > >> > > >> >
>> > > > >> > > >> >
>> > > > >> > > >> >
>> > > > >> > > >> >
>> > > > >> > > >> > On Tue, Jun 24, 2014 at 10:46 PM, Jacques Nadeau <
>> > > > >> > [email protected]>
>> > > > >> > > >> > wrote:
>> > > > >> > > >> >
>> > > > >> > > >> > > Hey guys,
>> > > > >> > > >> > >
>> > > > >> > > >> > > I just saw that the build on Jenkins is failing.  Any
>> > > > committer
>> > > > >> > > >> > interested
>> > > > >> > > >> > > in trying to troubleshoot?
>> > > > >> > > >> > >
>> > > > >> > > >> > > https://builds.apache.org/job/drill-scm/54
>> > > > >> > > >> > >
>> > > > >> > > >> >
>> > > > >> > > >>
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Reply via email to