[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated MAPREDUCE-5724:
------------------------------------------

    Attachment: MAPREDUCE-5724.patch

trying to do something like YARN-24 for JHS is a bit more complicated.

Instead, I've taken a different approach:

On startup the JHS will try creating the history directories, if it cannot 
because the the FS is not available or in safemode will retry for up to 2mins, 
if it times out, it will then shutdown.

So, instead failing immediately, the JHS will wait for the FS to become avail 
for a while.

I've hardcoded the 2mins timeout as I don't think we need to introduce a config 
value for this. If others feel otherwise, I can update the patch with a config 
prop for it.

> JobHistoryServer does not start if HDFS is not running
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-5724
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5724
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobhistoryserver
>    Affects Versions: 3.0.0, 2.4.0
>            Reporter: Alejandro Abdelnur
>            Assignee: Alejandro Abdelnur
>            Priority: Critical
>         Attachments: MAPREDUCE-5724.patch
>
>
> Starting JHS without HDFS running fails with the following error:
> {code}
> STARTUP_MSG:   build = git://git.apache.org/hadoop-common.git -r 
> ad74e8850b99e03b0b6435b04f5b3e9995bc3956; compiled by 'tucu' on 
> 2014-01-14T22:40Z
> STARTUP_MSG:   java = 1.7.0_45
> ************************************************************/
> 2014-01-14 16:47:40,264 INFO 
> org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer: registered UNIX signal 
> handlers for [TERM, HUP, INT]
> 2014-01-14 16:47:40,883 WARN org.apache.hadoop.util.NativeCodeLoader: Unable 
> to load native-hadoop library for your platform... using builtin-java classes 
> where applicable
> 2014-01-14 16:47:41,101 INFO org.apache.hadoop.mapreduce.v2.hs.JobHistory: 
> JobHistory Init
> 2014-01-14 16:47:41,710 INFO org.apache.hadoop.service.AbstractService: 
> Service org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager failed in state 
> INITED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Error 
> creating done directory: 
> [hdfs://localhost:8020/tmp/hadoop-yarn/staging/history/done]
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Error creating done 
> directory: [hdfs://localhost:8020/tmp/hadoop-yarn/staging/history/done]
>       at 
> org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.serviceInit(HistoryFileManager.java:505)
>       at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>       at 
> org.apache.hadoop.mapreduce.v2.hs.JobHistory.serviceInit(JobHistory.java:94)
>       at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>       at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
>       at 
> org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.serviceInit(JobHistoryServer.java:143)
>       at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>       at 
> org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.launchJobHistoryServer(JobHistoryServer.java:207)
>       at 
> org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.main(JobHistoryServer.java:217)
> Caused by: java.net.ConnectException: Call From dontknow.local/172.20.10.4 to 
> localhost:8020 failed on connection exception: java.net.ConnectException: 
> Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused
>       at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>       at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>       at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>       at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>       at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
>       at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
>       at org.apache.hadoop.ipc.Client.call(Client.java:1410)
>       at org.apache.hadoop.ipc.Client.call(Client.java:1359)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>       at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:606)
>       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:185)
>       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101)
>       at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
>       at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:671)
>       at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1722)
>       at org.apache.hadoop.fs.Hdfs.getFileStatus(Hdfs.java:124)
>       at org.apache.hadoop.fs.FileContext$14.next(FileContext.java:1106)
>       at org.apache.hadoop.fs.FileContext$14.next(FileContext.java:1102)
>       at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
>       at org.apache.hadoop.fs.FileContext.getFileStatus(FileContext.java:1102)
>       at org.apache.hadoop.fs.FileContext$Util.exists(FileContext.java:1514)
>       at 
> org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.mkdir(HistoryFileManager.java:561)
>       at 
> org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.serviceInit(HistoryFileManager.java:502)
>       ... 8 more
> Caused by: java.net.ConnectException: Connection refused
>       at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>       at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:735)
>       at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>       at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
>       at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
>       at 
> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:601)
>       at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:696)
>       at org.apache.hadoop.ipc.Client$Connection.access$2700(Client.java:367)
>       at org.apache.hadoop.ipc.Client.getConnection(Client.java:1458)
>       at org.apache.hadoop.ipc.Client.call(Client.java:1377)
>       ... 28 more
> 2014-01-14 16:47:41,713 INFO org.apache.hadoop.service.AbstractService: 
> Service org.apache.hadoop.mapreduce.v2.hs.JobHistory failed in state INITED; 
> cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Error creating 
> done directory: [hdfs://localhost:8020/tmp/hadoop-yarn/staging/history/done]
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Error creating done 
> directory: [hdfs://localhost:8020/tmp/hadoop-yarn/staging/history/done]
>       at 
> org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.serviceInit(HistoryFileManager.java:505)
>       at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>       at 
> org.apache.hadoop.mapreduce.v2.hs.JobHistory.serviceInit(JobHistory.java:94)
>       at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>       at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
>       at 
> org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.serviceInit(JobHistoryServer.java:143)
>       at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>       at 
> org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.launchJobHistoryServer(JobHistoryServer.java:207)
>       at 
> org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.main(JobHistoryServer.java:217)
> Caused by: java.net.ConnectException: Call From dontknow.local/172.20.10.4 to 
> localhost:8020 failed on connection exception: java.net.ConnectException: 
> Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused
>       at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>       at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>       at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>       at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>       at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
>       at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
>       at org.apache.hadoop.ipc.Client.call(Client.java:1410)
>       at org.apache.hadoop.ipc.Client.call(Client.java:1359)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>       at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:606)
>       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:185)
>       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101)
>       at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
>       at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:671)
>       at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1722)
>       at org.apache.hadoop.fs.Hdfs.getFileStatus(Hdfs.java:124)
>       at org.apache.hadoop.fs.FileContext$14.next(FileContext.java:1106)
>       at org.apache.hadoop.fs.FileContext$14.next(FileContext.java:1102)
>       at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
>       at org.apache.hadoop.fs.FileContext.getFileStatus(FileContext.java:1102)
>       at org.apache.hadoop.fs.FileContext$Util.exists(FileContext.java:1514)
>       at 
> org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.mkdir(HistoryFileManager.java:561)
>       at 
> org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.serviceInit(HistoryFileManager.java:502)
>       ... 8 more
> Caused by: java.net.ConnectException: Connection refused
>       at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>       at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:735)
>       at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>       at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
>       at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
>       at 
> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:601)
>       at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:696)
>       at org.apache.hadoop.ipc.Client$Connection.access$2700(Client.java:367)
>       at org.apache.hadoop.ipc.Client.getConnection(Client.java:1458)
>       at org.apache.hadoop.ipc.Client.call(Client.java:1377)
>       ... 28 more
> 2014-01-14 16:47:41,714 INFO org.apache.hadoop.mapreduce.v2.hs.JobHistory: 
> Stopping JobHistory
> 2014-01-14 16:47:41,714 INFO org.apache.hadoop.service.AbstractService: 
> Service org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer failed in state 
> INITED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Error 
> creating done directory: 
> [hdfs://localhost:8020/tmp/hadoop-yarn/staging/history/done]
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to