excellent,thx Shaofeng! we are tring to find the root cause Lu Jia
> 在 2016年11月21日,下午3:37,ShaoFeng Shi <[email protected]> 写道: > > Share a small "watch dog" script that we used before, you can run it with > Linux cron periodically to check whether Kylin is running; But this is not > the way we proposed; we strongly suggest you to investigate the root cause > of crashes. Usually it was caused by bad-designed cube, like using > Dictionary encoding for a UHC column. > > #!/bin/bash > > export KYLIN_HOME="/kylin/kylin-1.5.4.1-bin" > > if [ ! -f "${KYLIN_HOME}/pid" ] > > then > > echo "$(date) kylin is stopped, do nothing" > > exit 0 > > fi > > PID=`cat $KYLIN_HOME/pid` > > if ps -p $PID > /dev/null > > then > > echo "$(date): Process is running, do nothing" > > else > > echo "$(date): Pid $(PID) not exists, start kylin" > > export > PATH=/usr/lib64/qt-3.3/bin:/usr/bin:/bin:/usr/local/bin::/usr/local/sbin:/usr/sbin:/sbin:/apache/hadoop/bin:/apache/hbase/bin:/apache/pig/bin:/apache/hive/bin > > $KYLIN_HOME/bin/kylin.sh start > > fi > > > > 2016-11-21 14:06 GMT+08:00 Li Yang <[email protected]>: > >> Just to be clear, you still need to restart the Kylin process manually. And >> once Kylin process is up, it will resume all running jobs automatically. >> >> To auto-restart a dead Kylin process, you need some tools. Could be as >> simple as a cron job, to detect Kylin PID periodically and restart it when >> it's dead. >> >> Yang >> >>> On Sat, Nov 19, 2016 at 6:00 PM, 路加126 <[email protected]> wrote: >>> >>> Thanks Shaofeng! >>> >>> >>> Best Regards, >>> Lu Jia(Luke) >>> >>> >>> >>>> 在 2016年11月19日,下午4:01,ShaoFeng Shi <[email protected]> 写道: >>>> >>>> The auto-resume has been there for some time; 1.5.4 should have it, >>> suggest >>>> to upgrade to the latest version. >>>> >>>> 2016-11-19 14:55 GMT+08:00 路加126 <[email protected]>: >>>> >>>>> hi Yang: >>>>> >>>>> Could you tell me how to configure resuming automatically? >>>>> I met several times of job server process crash, resumed manually or >> via >>>>> shell script. My version is 1.5.2. >>>>> >>>>> >>>>> Best Regards, >>>>> Lu Jia(Luke) >>>>> >>>>> >>>>> >>>>>> 在 2016年11月18日,下午5:43,Li Yang <[email protected]> 写道: >>>>>> >>>>>> Just to be clear, even now, job won't have to restart from the >>> beginning >>>>>> after a job server crash. After job server bounces, all job will >> resume >>>>>> automatically from its last running step. Even better, if the last >>>>> running >>>>>> step is a MR job, MR job will continue to run without any loss. That >> is >>>>>> because job server is just a coordinator, it does not do any actual >>> work >>>>> by >>>>>> itself. >>>>>> >>>>>> Yang >>>>>> >>>>>>> On Fri, Nov 18, 2016 at 3:48 PM, 杨海乐 <[email protected]> wrote: >>>>>>> >>>>>>> Thanks very much @康凯森 >>>>>>> waiting upgrade >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> View this message in context: http://apache-kylin.74782.x6. >>>>>>> nabble.com/Kylin-Job-Server-is-single-point-tp6336p6342.html >>>>>>> Sent from the Apache Kylin mailing list archive at Nabble.com. >>>>>>> >>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Best regards, >>>> >>>> Shaofeng Shi 史少锋 >>> >>> >>> >> > > > > -- > Best regards, > > Shaofeng Shi 史少锋
