Share a small "watch dog" script that we used before, you can run it with
Linux cron periodically to check whether Kylin is running; But this is not
the way we proposed; we strongly suggest you to investigate the root cause
of crashes. Usually it was caused by bad-designed cube, like using
Dictionary encoding for a UHC column.
#!/bin/bash
export KYLIN_HOME="/kylin/kylin-1.5.4.1-bin"
if [ ! -f "${KYLIN_HOME}/pid" ]
then
echo "$(date) kylin is stopped, do nothing"
exit 0
fi
PID=`cat $KYLIN_HOME/pid`
if ps -p $PID > /dev/null
then
echo "$(date): Process is running, do nothing"
else
echo "$(date): Pid $(PID) not exists, start kylin"
export
PATH=/usr/lib64/qt-3.3/bin:/usr/bin:/bin:/usr/local/bin::/usr/local/sbin:/usr/sbin:/sbin:/apache/hadoop/bin:/apache/hbase/bin:/apache/pig/bin:/apache/hive/bin
$KYLIN_HOME/bin/kylin.sh start
fi
2016-11-21 14:06 GMT+08:00 Li Yang <[email protected]>:
> Just to be clear, you still need to restart the Kylin process manually. And
> once Kylin process is up, it will resume all running jobs automatically.
>
> To auto-restart a dead Kylin process, you need some tools. Could be as
> simple as a cron job, to detect Kylin PID periodically and restart it when
> it's dead.
>
> Yang
>
> On Sat, Nov 19, 2016 at 6:00 PM, 路加126 <[email protected]> wrote:
>
> > Thanks Shaofeng!
> >
> >
> > Best Regards,
> > Lu Jia(Luke)
> >
> >
> >
> > > 在 2016年11月19日,下午4:01,ShaoFeng Shi <[email protected]> 写道:
> > >
> > > The auto-resume has been there for some time; 1.5.4 should have it,
> > suggest
> > > to upgrade to the latest version.
> > >
> > > 2016-11-19 14:55 GMT+08:00 路加126 <[email protected]>:
> > >
> > >> hi Yang:
> > >>
> > >> Could you tell me how to configure resuming automatically?
> > >> I met several times of job server process crash, resumed manually or
> via
> > >> shell script. My version is 1.5.2.
> > >>
> > >>
> > >> Best Regards,
> > >> Lu Jia(Luke)
> > >>
> > >>
> > >>
> > >>> 在 2016年11月18日,下午5:43,Li Yang <[email protected]> 写道:
> > >>>
> > >>> Just to be clear, even now, job won't have to restart from the
> > beginning
> > >>> after a job server crash. After job server bounces, all job will
> resume
> > >>> automatically from its last running step. Even better, if the last
> > >> running
> > >>> step is a MR job, MR job will continue to run without any loss. That
> is
> > >>> because job server is just a coordinator, it does not do any actual
> > work
> > >> by
> > >>> itself.
> > >>>
> > >>> Yang
> > >>>
> > >>> On Fri, Nov 18, 2016 at 3:48 PM, 杨海乐 <[email protected]> wrote:
> > >>>
> > >>>> Thanks very much @康凯森
> > >>>> waiting upgrade
> > >>>>
> > >>>>
> > >>>> --
> > >>>> View this message in context: http://apache-kylin.74782.x6.
> > >>>> nabble.com/Kylin-Job-Server-is-single-point-tp6336p6342.html
> > >>>> Sent from the Apache Kylin mailing list archive at Nabble.com.
> > >>>>
> > >>
> > >>
> > >>
> > >
> > >
> > > --
> > > Best regards,
> > >
> > > Shaofeng Shi 史少锋
> >
> >
> >
>
--
Best regards,
Shaofeng Shi 史少锋