Kylin sends metadata as distributed cache of MR job. The missing file "file:/home/q/hadoop/kylin/tomcat/temp/kylin_job_meta3892468167792432608/meta" should be prepared on machine B and D before YARN kicks off mappers.
As to why the files were not there.... I don't know. On Wed, Jun 14, 2017 at 12:12 PM, Gavin_Chou <[email protected]> wrote: > Hi, all: > I have a problem while building cube at step 2. > > The error appears in yarn log: > > 2017-06-14 11:21:08,793 INFO org.apache.hadoop.yarn.server. > nodemanager.containermanager.application.Application: Application > application_1497364689294_0018 transitioned from NEW to INITING > 2017-06-14 11:21:08,793 INFO org.apache.hadoop.yarn.server. > nodemanager.containermanager.application.Application: Adding > container_1497364689294_0018_01_000001 to application application_ > 1497364689294_0018 > 2017-06-14 11:21:08,793 INFO org.apache.hadoop.yarn.server. > nodemanager.containermanager.application.Application: Application > application_1497364689294_0018 transitioned from INITING to RUNNING > 2017-06-14 11:21:08,794 INFO org.apache.hadoop.yarn.server. > nodemanager.containermanager.container.Container: Container > container_1497364689294_0018_01_000001 transitioned from NEW to LOCALIZING > 2017-06-14 11:21:08,794 INFO org.apache.hadoop.yarn.server. > nodemanager.containermanager.AuxServices: Got event CONTAINER_INIT for > appId application_1497364689294_0018 > 2017-06-14 11:21:08,794 INFO org.apache.hadoop.yarn.server. > nodemanager.containermanager.localizer.LocalizedResource: Resource > file:/tmp/hadoop-yarn/staging/hadoop/.staging/job_1497364689294_0018/job.jar > transitioned from INIT to DOWNLOADING > 2017-06-14 11:21:08,794 INFO org.apache.hadoop.yarn.server. > nodemanager.containermanager.localizer.LocalizedResource: Resource > file:/tmp/hadoop-yarn/staging/hadoop/.staging/job_1497364689294_0018/job.splitmetainfo > transitioned from INIT to DOWNLOADING > 2017-06-14 11:21:08,794 INFO org.apache.hadoop.yarn.server. > nodemanager.containermanager.localizer.LocalizedResource: Resource > file:/tmp/hadoop-yarn/staging/hadoop/.staging/job_1497364689294_0018/job.split > transitioned from INIT to DOWNLOADING > 2017-06-14 11:21:08,794 INFO org.apache.hadoop.yarn.server. > nodemanager.containermanager.localizer.LocalizedResource: Resource > file:/tmp/hadoop-yarn/staging/hadoop/.staging/job_1497364689294_0018/job.xml > transitioned from INIT to DOWNLOADING > 2017-06-14 11:21:08,794 INFO org.apache.hadoop.yarn.server. > nodemanager.containermanager.localizer.LocalizedResource: > Resource > file:/home/q/hadoop/kylin/tomcat/temp/kylin_job_meta3892468167792432608/meta > transitioned from INIT to DOWNLOADING > 2017-06-14 11:21:08,794 INFO org.apache.hadoop.yarn.server. > nodemanager.containermanager.localizer.ResourceLocalizationService: > Created localizer for container_1497364689294_0018_01_000001 > 2017-06-14 11:21:08,794 INFO org.apache.hadoop.yarn.server. > nodemanager.containermanager.localizer.ResourceLocalizationService: > Downloading public rsrc:{ > file:/home/q/hadoop/kylin/tomcat/temp/kylin_job_meta3892468167792432608/meta, > 1497410467000, FILE, null } > 2017-06-14 11:21:08,796 INFO org.apache.hadoop.yarn.server. > nodemanager.containermanager.localizer.ResourceLocalizationService: > Writing credentials to the nmPrivate file /home/q/hadoop/hadoop/ > tmp/nm-local-dir/nmPrivate/container_1497364689294_0018_01_000001.tokens. > Credentials list: > 2017-06-14 11:21:08,796 INFO org.apache.hadoop.yarn.server. > nodemanager.containermanager.localizer.ResourceLocalizationService: > Failed to download rsrc { { file:/home/q/hadoop/kylin/ > tomcat/temp/kylin_job_meta3892468167792432608/meta, 1497410467000, FILE, > null },pending,[(container_1497364689294_0018_01_000001)] > ,781495827608056,DOWNLOADING} > java.io.FileNotFoundException: File file:/home/q/hadoop/kylin/ > tomcat/temp/kylin_job_meta3892468167792432608/meta does not exist > at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus( > RawLocalFileSystem.java:524) > at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal( > RawLocalFileSystem.java:737) > at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus( > RawLocalFileSystem.java:514) > at org.apache.hadoop.fs.FilterFileSystem.getFileStatus( > FilterFileSystem.java:397) > at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:250) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:353) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:59) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at java.util.concurrent.ThreadPoolExecutor.runWorker( > ThreadPoolExecutor.java:1145) > at java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > 2017-06-14 11:21:08,796 INFO > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: > Initializing user hadoop > 2017-06-14 11:21:08,797 INFO org.apache.hadoop.yarn.server. > nodemanager.containermanager.localizer.LocalizedResource: > Resource file:/home/q/hadoop/kylin/tomcat/temp/kylin_job_ > meta3892468167792432608/meta(->/home/q/hadoop/hadoop/tmp/nm-local-dir/filecache/18/meta) > transitioned from DOWNLOADING to FAILED > 2017-06-14 11:21:08,797 INFO org.apache.hadoop.yarn.server. > nodemanager.containermanager.container.Container: Container > container_1497364689294_0018_01_000001 transitioned from LOCALIZING to > LOCALIZATION_FAILED > 2017-06-14 11:21:08,797 INFO org.apache.hadoop.yarn.server. > nodemanager.containermanager.localizer.LocalResourcesTrackerImpl: > Container container_1497364689294_0018_01_000001 sent RELEASE event on a > resource request { > file:/home/q/hadoop/kylin/tomcat/temp/kylin_job_meta3892468167792432608/meta, > 1497410467000, FILE, null } not present in cache. > 2017-06-14 11:21:08,797 WARN > org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: > USER=hadoop OPERATION=Container Finished - Failed TARGET=ContainerImpl > RESULT=FAILURE DESCRIPTION=Container failed with state: > LOCALIZATION_FAILED APPID=application_1497364689294_0018 > CONTAINERID=container_1497364689294_0018_01_000001 > 2017-06-14 11:21:08,797 INFO org.apache.hadoop.yarn.server. > nodemanager.containermanager.container.Container: Container > container_1497364689294_0018_01_000001 transitioned > from LOCALIZATION_FAILED to DONE > > This error appears in yarn-nodemanager log of machine B and D. And before > it I found a warning log in yarn-nodemanager log in machine C (Kylin is > only installed in machine A): > > 2017-06-14 11:21:01,131 INFO org.apache.hadoop.yarn.server. > nodemanager.containermanager.container.Container: Container > container_1497364689294_0017_01_000002 transitioned from LOCALIZING to > LOCALIZED > 2017-06-14 11:21:01,146 INFO org.apache.hadoop.yarn.server. > nodemanager.containermanager.container.Container: Container > container_1497364689294_0017_01_000002 transitioned from LOCALIZED to > RUNNING > 2017-06-14 11:21:01,146 INFO org.apache.hadoop.yarn.server. > nodemanager.containermanager.monitor.ContainersMonitorImpl: Neither > virutal-memory nor physical-memory monitoring is needed. Not running the > monitor-thread > 2017-06-14 11:21:01,149 INFO > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: > launchContainer: [nice, -n, 0, bash, /home/q/hadoop/hadoop/tmp/nm- > local-dir/usercache/hadoop/appcache/application_ > 1497364689294_0017/container_1497364689294_0017_01_000002/ > default_container_executor.sh] > 2017-06-14 11:21:05,024 INFO org.apache.hadoop.yarn.server. > nodemanager.containermanager.ContainerManagerImpl: Stopping container > with container Id: container_1497364689294_0017_01_000002 > 2017-06-14 11:21:05,025 INFO > org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: > USER=hadoop IP=10.90.181.160 OPERATION=Stop Container Request > TARGET=ContainerManageImpl RESULT=SUCCESS APPID=application_ > 1497364689294_0017 CONTAINERID=container_1497364689294_0017_01_000002 > 2017-06-14 11:21:05,025 INFO org.apache.hadoop.yarn.server. > nodemanager.containermanager.container.Container: Container > container_1497364689294_0017_01_000002 transitioned from RUNNING to > KILLING > 2017-06-14 11:21:05,025 INFO org.apache.hadoop.yarn.server. > nodemanager.containermanager.launcher.ContainerLaunch: Cleaning up > container container_1497364689294_0017_01_000002 > 2017-06-14 11:21:05,028 WARN > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: > Exit code from container container_1497364689294_0017_01_000002 is : 143 > 2017-06-14 11:21:05,040 INFO org.apache.hadoop.yarn.server. > nodemanager.containermanager.container.Container: Container > container_1497364689294_0017_01_000002 transitioned from KILLING to > CONTAINER_CLEANEDUP_AFTER_KILL > 2017-06-14 11:21:05,041 INFO > org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: > USER=hadoop OPERATION=Container Finished - Killed TARGET=ContainerImpl > RESULT=SUCCESS APPID=application_1497364689294_0017 CONTAINERID=container_ > 1497364689294_0017_01_000002 > 2017-06-14 11:21:05,041 INFO org.apache.hadoop.yarn.server. > nodemanager.containermanager.container.Container: Container > container_1497364689294_0017_01_000002 transitioned > from CONTAINER_CLEANEDUP_AFTER_KILL to DONE > > It puzzles me that why kylin wants to load a local file by applications on > other nodes in step 2? How can I solve it? > > Here are some additional information(They may be helpful for analyzing the > problem): > The cluster has 4 machines: A B C and D. > Hadoop version 2.5.0 support snappy > Namenode: A(stand by) B(active) > Datanode: all > Hive version 0.13.1 recompile for hadoop2 > HBase version 0.98.6 recompile for hadoop 2.5.0 > Master: A(active) and B > When I set “hbase.rootdir” in hbase-site.xml as detail IP address of > active namenode, the step 2 is ok, but it will failed at the last 5 step. > So I change the setting item to cluster name. And there is no problem in > hbase logs. > > Thank you > > Best regards > > > > >
