excellent,thx Shaofeng!
we are tring to find the root cause

Lu Jia

> 在 2016年11月21日,下午3:37,ShaoFeng Shi <[email protected]> 写道:
> 
> Share a small "watch dog" script that we used before, you can run it with
> Linux cron periodically to check whether Kylin is running; But this is not
> the way we proposed; we strongly suggest you to investigate the root cause
> of crashes. Usually it was caused by bad-designed cube, like using
> Dictionary encoding for a UHC column.
> 
> #!/bin/bash
> 
> export KYLIN_HOME="/kylin/kylin-1.5.4.1-bin"
> 
> if [ ! -f "${KYLIN_HOME}/pid" ]
> 
>   then
> 
>    echo "$(date) kylin is stopped, do nothing"
> 
>    exit 0
> 
> fi
> 
> PID=`cat $KYLIN_HOME/pid`
> 
> if ps -p $PID > /dev/null
> 
> then
> 
>  echo "$(date): Process is running, do nothing"
> 
> else
> 
>  echo "$(date): Pid $(PID) not exists, start kylin"
> 
>  export
> PATH=/usr/lib64/qt-3.3/bin:/usr/bin:/bin:/usr/local/bin::/usr/local/sbin:/usr/sbin:/sbin:/apache/hadoop/bin:/apache/hbase/bin:/apache/pig/bin:/apache/hive/bin
> 
>  $KYLIN_HOME/bin/kylin.sh start
> 
> fi
> 
> 
> 
> 2016-11-21 14:06 GMT+08:00 Li Yang <[email protected]>:
> 
>> Just to be clear, you still need to restart the Kylin process manually. And
>> once Kylin process is up, it will resume all running jobs automatically.
>> 
>> To auto-restart a dead Kylin process, you need some tools. Could be as
>> simple as a cron job, to detect Kylin PID periodically and restart it when
>> it's dead.
>> 
>> Yang
>> 
>>> On Sat, Nov 19, 2016 at 6:00 PM, 路加126 <[email protected]> wrote:
>>> 
>>> Thanks Shaofeng!
>>> 
>>> 
>>> Best Regards,
>>> Lu Jia(Luke)
>>> 
>>> 
>>> 
>>>> 在 2016年11月19日,下午4:01,ShaoFeng Shi <[email protected]> 写道:
>>>> 
>>>> The auto-resume has been there for some time; 1.5.4 should have it,
>>> suggest
>>>> to upgrade to the latest version.
>>>> 
>>>> 2016-11-19 14:55 GMT+08:00 路加126 <[email protected]>:
>>>> 
>>>>> hi Yang:
>>>>> 
>>>>> Could you tell me how to configure resuming automatically?
>>>>> I met several times of job server process crash, resumed manually or
>> via
>>>>> shell script. My version is 1.5.2.
>>>>> 
>>>>> 
>>>>> Best Regards,
>>>>> Lu Jia(Luke)
>>>>> 
>>>>> 
>>>>> 
>>>>>> 在 2016年11月18日,下午5:43,Li Yang <[email protected]> 写道:
>>>>>> 
>>>>>> Just to be clear, even now, job won't have to restart from the
>>> beginning
>>>>>> after a job server crash. After job server bounces, all job will
>> resume
>>>>>> automatically from its last running step. Even better, if the last
>>>>> running
>>>>>> step is a MR job, MR job will continue to run without any loss. That
>> is
>>>>>> because job server is just a coordinator, it does not do any actual
>>> work
>>>>> by
>>>>>> itself.
>>>>>> 
>>>>>> Yang
>>>>>> 
>>>>>>> On Fri, Nov 18, 2016 at 3:48 PM, 杨海乐 <[email protected]> wrote:
>>>>>>> 
>>>>>>> Thanks very much @康凯森
>>>>>>> waiting upgrade
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> View this message in context: http://apache-kylin.74782.x6.
>>>>>>> nabble.com/Kylin-Job-Server-is-single-point-tp6336p6342.html
>>>>>>> Sent from the Apache Kylin mailing list archive at Nabble.com.
>>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> --
>>>> Best regards,
>>>> 
>>>> Shaofeng Shi 史少锋
>>> 
>>> 
>>> 
>> 
> 
> 
> 
> -- 
> Best regards,
> 
> Shaofeng Shi 史少锋


Reply via email to