[
https://issues.apache.org/jira/browse/ZOOKEEPER-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lin Yiqun updated ZOOKEEPER-2315:
---------------------------------
Attachment: ZOOKEEPER-2315.002.patch
Thanks [~michim] review.I addressed your comment.I define a {warInfo} for both
exception message and warn log.
> Change client connect zk service timeout log level from Info to Warn level
> --------------------------------------------------------------------------
>
> Key: ZOOKEEPER-2315
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2315
> Project: ZooKeeper
> Issue Type: Wish
> Components: java client
> Affects Versions: 3.4.6
> Reporter: Lin Yiqun
> Priority: Minor
> Fix For: 3.4.7, 3.5.2, 3.6.0
>
> Attachments: ZOOKEEPER-2315.001.patch, ZOOKEEPER-2315.002.patch
>
>
> Recently my the resourmanager of my hadoop cluster is fail suddenly,so I
> look into the rsourcemanager log.But the log is not helpful for me to direct
> find the reson until I found the zk timeout info log record.
> {code}
> 2015-11-06 06:34:11,257 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode:
> Assigned container container_1446016482901_292094_01_000140 of capacity
> <memory:1024, vCores:1> on host mofa2089:41361, which has 30 containers,
> <memory:31744, vCores:30> used and <memory:9216, vCores:10> available after
> allocation
> 2015-11-06 06:34:11,266 INFO org.apache.zookeeper.ClientCnxn: Unable to
> reconnect to ZooKeeper service, session 0x24f4fd5118e5c6e has expired,
> closing socket connection
> 2015-11-06 06:34:11,271 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
> container_1446016482901_292094_01_000105 Container Transitioned from RUNNING
> to COMPLETED
> 2015-11-06 06:34:11,271 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt:
> Completed container: container_1446016482901_292094_01_000105 in state:
> COMPLETED event:FINISHED
> 2015-11-06 06:34:11,271 INFO
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dongwei
> OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS
> APPID=application_1446016482901_292094
> CONTAINERID=container_1446016482901_292094_01_000105
> 2015-11-06 06:34:11,271 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode:
> Released container container_1446016482901_292094_01_000105 of capacity
> <memory:1024, vCores:1> on host mofa010079:50991, which currently has 29
> containers, <memory:33792, vCores:29> used and <memory:7168, vCores:11>
> available, release resources=true
> 2015-11-06 06:34:11,271 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
> Application attempt appattempt_1446016482901_292094_000001 released container
> container_1446016482901_292094_01_000105 on node: host: mofa010079:50991
> #containers=29 available=<memory:7168, vCores:11> used=<memory:33792,
> vCores:29> with event: FINISHED
> 2015-11-06 06:34:11,272 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
> container_1446016482901_292094_01_000141 Container Transitioned from NEW to
> ALLOCATED
> 2015-11-06 06:34:11,272 INFO
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dongwei
> OPERATION=AM Allocated Container TARGET=SchedulerApp
> RESULT=SUCCESS APPID=application_1446016482901_292094
> CONTAINERID=container_1446016482901_292094_01_000141
> 2015-11-06 06:34:11,272 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode:
> Assigned container container_1446016482901_292094_01_000141 of capacity
> <memory:1024, vCores:1> on host mofa010079:50991, which has 30 containers,
> <memory:34816, vCores:30> used and <memory:6144, vCores:10> available after
> allocation
> 2015-11-06 06:34:11,295 WARN
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher:
>
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher$LauncherThread
> interrupted. Returning.
> 2015-11-06 06:34:11,296 INFO org.apache.hadoop.ipc.Server: Stopping server on
> 8032
> 2015-11-06 06:34:11,297 INFO org.apache.hadoop.ipc.Server: Stopping IPC
> Server Responder
> 2015-11-06 06:34:11,297 INFO org.apache.hadoop.ipc.Server: Stopping server on
> 8030
> 2015-11-06 06:34:11,297 INFO org.apache.hadoop.ipc.Server: Stopping IPC
> Server listener on 8032
> 2015-11-06 06:34:11,298 INFO org.apache.hadoop.ipc.Server: Stopping IPC
> Server Responder
> 2015-11-06 06:34:11,298 INFO org.apache.hadoop.ipc.Server: Stopping server on
> 8031
> 2015-11-06 06:34:11,298 INFO org.apache.hadoop.ipc.Server: Stopping IPC
> Server listener on 8030
> 2015-11-06 06:34:11,300 INFO org.apache.hadoop.ipc.Server: Stopping IPC
> Server listener on 80312015-11-06 06:34:11,300 INFO
> org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
> {code}
> The problem is solved,but it's too difficult to find the connect zk service
> time out info from so many info log records.And we will easily to ignore
> these records.So we should chang these zk seesion timeout log level form info
> level to warn.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)