[ 
https://issues.apache.org/jira/browse/HIVE-13507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250463#comment-15250463
 ] 

Sergio Peña commented on HIVE-13507:
------------------------------------

hi [~sseth], I will need to revert this patch as it is causing some issues with 
the ptest infra.
While I was running some tests, I found that ptest is spinning a lot of 
instances due to an error exception:

{noformat}
2016-04-20 13:01:25 INFO  CloudExecutionContextProvider:213 - Attempting to 
create 12 nodes
2016-04-20 13:02:34 INFO  CloudExecutionContextProvider:281 - Verify number of 
hots: 1
2016-04-20 13:02:34 INFO  CloudExecutionContextProvider:291 - Verifying node: 
{id=us-west-1/i-b245ef07, providerId=i-b245ef07, 
name=spena-hive-spark-ptest-slaves-b245ef07, location={scope=ZONE, 
id=us-west-1c, description=us-west-1c, parent=us-west-1, iso3166Codes=[US-CA]}, 
group=spena-hive-spark-ptest-slaves, imageId=us-west-1/ami-1ac6dc5f, 
os={family=unrecognized, arch=paravirtual, version=, 
description=360379543683/hive-spark-ptest-7, is64Bit=true}, 
status=RUNNING[running], loginPort=22, hostname=ip-10-236-128-180, 
privateAddresses=[10.236.128.180], publicAddresses=[54.241.234.115], 
hardware={id=c3.2xlarge, providerId=c3.2xlarge, processors=[{cores=8.0, 
speed=3.5}], ram=15360, volumes=[{type=LOCAL, size=80.0, device=/dev/sdb, 
bootDevice=false, durable=false}, {type=LOCAL, size=80.0, device=/dev/sdc, 
bootDevice=false, durable=false}, {id=vol-df82d662, type=SAN, device=/dev/sda1, 
bootDevice=true, durable=true}], hypervisor=xen, 
supportsImage=And(ALWAYS_TRUE,Or(isWindows(),requiresVirtualizationType(paravirtual)),ALWAYS_TRUE,is64Bit())},
 loginUser=root, tags=[group=spena-hive-spark-ptest-slaves], 
userMetadata={owner=sergio.pena, Name=spena-hive-spark-ptest-slaves-b245ef07}}
2016-04-20 13:02:34 INFO  CloudExecutionContextProvider:45 - Starting 
LocalCommandId=ssh -v -i /home/hiveptest/.ssh/hive-ptest-user-key  -l hiveptest 
54.241.234.115 'pkill -f java': {}1
2016-04-20 13:02:35 INFO  CloudExecutionContextProvider:60 - Finished 
LocalCommandId=1. ElapsedTime(seconds)=0
2016-04-20 13:02:35 ERROR CloudExecutionContextProvider:296 - Node 
{id=us-west-1/i-b245ef07, providerId=i-b245ef07, 
name=spena-hive-spark-ptest-slaves-b245ef07, location={scope=ZONE, 
id=us-west-1c, description=us-west-1c, parent=us-west-1, iso3166Codes=[US-CA]}, 
group=spena-hive-spark-ptest-slaves, imageId=us-west-1/ami-1ac6dc5f, 
os={family=unrecognized, arch=paravirtual, version=, 
description=360379543683/hive-spark-ptest-7, is64Bit=true}, 
status=RUNNING[running], loginPort=22, hostname=ip-10-236-128-180, 
privateAddresses=[10.236.128.180], publicAddresses=[54.241.234.115], 
hardware={id=c3.2xlarge, providerId=c3.2xlarge, processors=[{cores=8.0, 
speed=3.5}], ram=15360, volumes=[{type=LOCAL, size=80.0, device=/dev/sdb, 
bootDevice=false, durable=false}, {type=LOCAL, size=80.0, device=/dev/sdc, 
bootDevice=false, durable=false}, {id=vol-df82d662, type=SAN, device=/dev/sda1, 
bootDevice=true, durable=true}], hypervisor=xen, 
supportsImage=And(ALWAYS_TRUE,Or(isWindows(),requiresVirtualizationType(paravirtual)),ALWAYS_TRUE,is64Bit())},
 loginUser=root, tags=[group=spena-hive-spark-ptest-slaves], 
userMetadata={owner=sergio.pena, Name=spena-hive-spark-ptest-slaves-b245ef07}} 
is bad on startup
java.lang.IllegalStateException: This stopwatch is already stopped.
        at 
com.google.common.base.Preconditions.checkState(Preconditions.java:150) 
~[guava-15.0.jar:?]
        at com.google.common.base.Stopwatch.stop(Stopwatch.java:177) 
~[guava-15.0.jar:?]
        at 
org.apache.hive.ptest.execution.LocalCommand.getExitCode(LocalCommand.java:59) 
~[LocalCommand.class:?]
        at 
org.apache.hive.ptest.execution.ssh.SSHCommandExecutor.execute(SSHCommandExecutor.java:72)
 ~[SSHCommandExecutor.class:?]
        at 
org.apache.hive.ptest.execution.context.CloudExecutionContextProvider$3.run(CloudExecutionContextProvider.java:293)
 [CloudExecutionContextProvider$3.class:?]
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 
[?:1.7.0_45]
        at java.util.concurrent.FutureTask.run(FutureTask.java:262) [?:1.7.0_45]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
[?:1.7.0_45]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
[?:1.7.0_45]
        at java.lang.Thread.run(Thread.java:744) [?:1.7.0_45]
2016-04-20 13:02:35 INFO  CloudExecutionContextProvider:354 - Submitting 
termination for {id=us-west-1/i-b245ef07, providerId=i-b245ef07, 
name=spena-hive-spark-ptest-slaves-b245ef07, location={scope=ZONE, 
id=us-west-1c, description=us-west-1c, parent=us-west-1, iso3166Codes=[US-CA]}, 
group=spena-hive-spark-ptest-slaves, imageId=us-west-1/ami-1ac6dc5f, 
os={family=unrecognized, arch=paravirtual, version=, 
description=360379543683/hive-spark-ptest-7, is64Bit=true}, 
status=RUNNING[running], loginPort=22, hostname=ip-10-236-128-180, 
privateAddresses=[10.236.128.180], publicAddresses=[54.241.234.115], 
hardware={id=c3.2xlarge, providerId=c3.2xlarge, processors=[{cores=8.0, 
speed=3.5}], ram=15360, volumes=[{type=LOCAL, size=80.0, device=/dev/sdb, 
bootDevice=false, durable=false}, {type=LOCAL, size=80.0, device=/dev/sdc, 
bootDevice=false, durable=false}, {id=vol-df82d662, type=SAN, device=/dev/sda1, 
bootDevice=true, durable=true}], hypervisor=xen, 
supportsImage=And(ALWAYS_TRUE,Or(isWindows(),requiresVirtualizationType(paravirtual)),ALWAYS_TRUE,is64Bit())},
 loginUser=root, tags=[group=spena-hive-spark-ptest-slaves], 
userMetadata={owner=sergio.pena, Name=spena-hive-spark-ptest-slaves-b245ef07}}
2016-04-20 13:02:35 INFO  CloudExecutionContextProvider:226 - Successfully 
created 0 nodes
2016-04-20 13:02:35 INFO  CloudExecutionContextProvider:233 - Pausing creation 
process for 60 seconds
2016-04-20 13:03:35 INFO  CloudExecutionContextProvider:213 - Attempting to 
create 12 nodes
{noformat}

As you see, due to the error, there were 0  nodes created (and they're supposed 
to be terminated) ,but for some reason Amazon is not terminating them, so this 
got in a loop for a long time.

I don't know why the error, but I reverted the patch locally on the Ptest 
server, and everything is working normally again.

> Improved logging for ptest
> --------------------------
>
>                 Key: HIVE-13507
>                 URL: https://issues.apache.org/jira/browse/HIVE-13507
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Siddharth Seth
>            Assignee: Sergio Peña
>             Fix For: 2.1.0
>
>         Attachments: HIVE-13507.01.patch
>
>
> Include information about batch runtimes, outlier lists, host completion 
> times, etc. Try identifying tests which cause the build to take a long time 
> while holding onto resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to