[
https://issues.apache.org/jira/browse/HIVE-13507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250463#comment-15250463
]
Sergio Peña commented on HIVE-13507:
------------------------------------
hi [~sseth], I will need to revert this patch as it is causing some issues with
the ptest infra.
While I was running some tests, I found that ptest is spinning a lot of
instances due to an error exception:
{noformat}
2016-04-20 13:01:25 INFO CloudExecutionContextProvider:213 - Attempting to
create 12 nodes
2016-04-20 13:02:34 INFO CloudExecutionContextProvider:281 - Verify number of
hots: 1
2016-04-20 13:02:34 INFO CloudExecutionContextProvider:291 - Verifying node:
{id=us-west-1/i-b245ef07, providerId=i-b245ef07,
name=spena-hive-spark-ptest-slaves-b245ef07, location={scope=ZONE,
id=us-west-1c, description=us-west-1c, parent=us-west-1, iso3166Codes=[US-CA]},
group=spena-hive-spark-ptest-slaves, imageId=us-west-1/ami-1ac6dc5f,
os={family=unrecognized, arch=paravirtual, version=,
description=360379543683/hive-spark-ptest-7, is64Bit=true},
status=RUNNING[running], loginPort=22, hostname=ip-10-236-128-180,
privateAddresses=[10.236.128.180], publicAddresses=[54.241.234.115],
hardware={id=c3.2xlarge, providerId=c3.2xlarge, processors=[{cores=8.0,
speed=3.5}], ram=15360, volumes=[{type=LOCAL, size=80.0, device=/dev/sdb,
bootDevice=false, durable=false}, {type=LOCAL, size=80.0, device=/dev/sdc,
bootDevice=false, durable=false}, {id=vol-df82d662, type=SAN, device=/dev/sda1,
bootDevice=true, durable=true}], hypervisor=xen,
supportsImage=And(ALWAYS_TRUE,Or(isWindows(),requiresVirtualizationType(paravirtual)),ALWAYS_TRUE,is64Bit())},
loginUser=root, tags=[group=spena-hive-spark-ptest-slaves],
userMetadata={owner=sergio.pena, Name=spena-hive-spark-ptest-slaves-b245ef07}}
2016-04-20 13:02:34 INFO CloudExecutionContextProvider:45 - Starting
LocalCommandId=ssh -v -i /home/hiveptest/.ssh/hive-ptest-user-key -l hiveptest
54.241.234.115 'pkill -f java': {}1
2016-04-20 13:02:35 INFO CloudExecutionContextProvider:60 - Finished
LocalCommandId=1. ElapsedTime(seconds)=0
2016-04-20 13:02:35 ERROR CloudExecutionContextProvider:296 - Node
{id=us-west-1/i-b245ef07, providerId=i-b245ef07,
name=spena-hive-spark-ptest-slaves-b245ef07, location={scope=ZONE,
id=us-west-1c, description=us-west-1c, parent=us-west-1, iso3166Codes=[US-CA]},
group=spena-hive-spark-ptest-slaves, imageId=us-west-1/ami-1ac6dc5f,
os={family=unrecognized, arch=paravirtual, version=,
description=360379543683/hive-spark-ptest-7, is64Bit=true},
status=RUNNING[running], loginPort=22, hostname=ip-10-236-128-180,
privateAddresses=[10.236.128.180], publicAddresses=[54.241.234.115],
hardware={id=c3.2xlarge, providerId=c3.2xlarge, processors=[{cores=8.0,
speed=3.5}], ram=15360, volumes=[{type=LOCAL, size=80.0, device=/dev/sdb,
bootDevice=false, durable=false}, {type=LOCAL, size=80.0, device=/dev/sdc,
bootDevice=false, durable=false}, {id=vol-df82d662, type=SAN, device=/dev/sda1,
bootDevice=true, durable=true}], hypervisor=xen,
supportsImage=And(ALWAYS_TRUE,Or(isWindows(),requiresVirtualizationType(paravirtual)),ALWAYS_TRUE,is64Bit())},
loginUser=root, tags=[group=spena-hive-spark-ptest-slaves],
userMetadata={owner=sergio.pena, Name=spena-hive-spark-ptest-slaves-b245ef07}}
is bad on startup
java.lang.IllegalStateException: This stopwatch is already stopped.
at
com.google.common.base.Preconditions.checkState(Preconditions.java:150)
~[guava-15.0.jar:?]
at com.google.common.base.Stopwatch.stop(Stopwatch.java:177)
~[guava-15.0.jar:?]
at
org.apache.hive.ptest.execution.LocalCommand.getExitCode(LocalCommand.java:59)
~[LocalCommand.class:?]
at
org.apache.hive.ptest.execution.ssh.SSHCommandExecutor.execute(SSHCommandExecutor.java:72)
~[SSHCommandExecutor.class:?]
at
org.apache.hive.ptest.execution.context.CloudExecutionContextProvider$3.run(CloudExecutionContextProvider.java:293)
[CloudExecutionContextProvider$3.class:?]
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
[?:1.7.0_45]
at java.util.concurrent.FutureTask.run(FutureTask.java:262) [?:1.7.0_45]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
[?:1.7.0_45]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
[?:1.7.0_45]
at java.lang.Thread.run(Thread.java:744) [?:1.7.0_45]
2016-04-20 13:02:35 INFO CloudExecutionContextProvider:354 - Submitting
termination for {id=us-west-1/i-b245ef07, providerId=i-b245ef07,
name=spena-hive-spark-ptest-slaves-b245ef07, location={scope=ZONE,
id=us-west-1c, description=us-west-1c, parent=us-west-1, iso3166Codes=[US-CA]},
group=spena-hive-spark-ptest-slaves, imageId=us-west-1/ami-1ac6dc5f,
os={family=unrecognized, arch=paravirtual, version=,
description=360379543683/hive-spark-ptest-7, is64Bit=true},
status=RUNNING[running], loginPort=22, hostname=ip-10-236-128-180,
privateAddresses=[10.236.128.180], publicAddresses=[54.241.234.115],
hardware={id=c3.2xlarge, providerId=c3.2xlarge, processors=[{cores=8.0,
speed=3.5}], ram=15360, volumes=[{type=LOCAL, size=80.0, device=/dev/sdb,
bootDevice=false, durable=false}, {type=LOCAL, size=80.0, device=/dev/sdc,
bootDevice=false, durable=false}, {id=vol-df82d662, type=SAN, device=/dev/sda1,
bootDevice=true, durable=true}], hypervisor=xen,
supportsImage=And(ALWAYS_TRUE,Or(isWindows(),requiresVirtualizationType(paravirtual)),ALWAYS_TRUE,is64Bit())},
loginUser=root, tags=[group=spena-hive-spark-ptest-slaves],
userMetadata={owner=sergio.pena, Name=spena-hive-spark-ptest-slaves-b245ef07}}
2016-04-20 13:02:35 INFO CloudExecutionContextProvider:226 - Successfully
created 0 nodes
2016-04-20 13:02:35 INFO CloudExecutionContextProvider:233 - Pausing creation
process for 60 seconds
2016-04-20 13:03:35 INFO CloudExecutionContextProvider:213 - Attempting to
create 12 nodes
{noformat}
As you see, due to the error, there were 0 nodes created (and they're supposed
to be terminated) ,but for some reason Amazon is not terminating them, so this
got in a loop for a long time.
I don't know why the error, but I reverted the patch locally on the Ptest
server, and everything is working normally again.
> Improved logging for ptest
> --------------------------
>
> Key: HIVE-13507
> URL: https://issues.apache.org/jira/browse/HIVE-13507
> Project: Hive
> Issue Type: Sub-task
> Reporter: Siddharth Seth
> Assignee: Sergio Peña
> Fix For: 2.1.0
>
> Attachments: HIVE-13507.01.patch
>
>
> Include information about batch runtimes, outlier lists, host completion
> times, etc. Try identifying tests which cause the build to take a long time
> while holding onto resources.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)