[GitHub] [druid] qianmoQ commented on issue #8605: Failed to publish segments because of [java.lang.RuntimeException: Aborting transaction!].

GitBox Fri, 03 Jan 2020 02:48:29 -0800

qianmoQ commented on issue #8605: Failed to publish segments because of 
[java.lang.RuntimeException: Aborting transaction!].
URL: https://github.com/apache/druid/issues/8605#issuecomment-570538079
 
 
   > 
[coordinator-overlord.log](https://github.com/apache/druid/files/4018386/coordinator-overlord.log)
   > I encountered this problem, too. As a freshman on Druid, I've no idea how 
to solve it.
   > The datas in kafka may be ingested again when next new task is running. I 
can query those old records before that segment fails, but after the failure 
those records are gone.
   
   
这个问题的出现原因在于，当前运行task节点内存|CPU资源不足导致，并且druid无法去释放已经完成的task，这些task还一直常驻内存中，导致资源不足，唯有手动去释放这些物理资源，可以参考以下脚本进行资源监控释放：
   
   ```sh
   # /bin/bash
   
   DRUID_RUNNING_TASKS_PIDS=`ps -ef f|grep '\_ java -cp 
conf/druid/_common:conf/druid/middleManager:lib'|grep -v grep|awk '{print $2}'`
   
   CURRENT_TIMESTAMP=`date +%s`
   
   for pid in $DRUID_RUNNING_TASKS_PIDS
   do
       CUEERNT_START_TIME=`ps -p $pid -o lstart|tail -1`
       TEMP=`date -d "$CUEERNT_START_TIME" +%s`
       TIME_DIFF=$(($CURRENT_TIMESTAMP - $TEMP))
       if [[ $TIME_DIFF -gt 3600 ]]; then
          echo 'current PID $pid,start time $CUEERNT_START_TIME, timestamp$TEMP'
          kill -9 $pid
       fi
   done
   ```
   
   请修改脚本中的3600大于您部署task的时间的，为保证数据服务，尽量是task的2倍，将本脚本监控加入系统的crontab中，比如：
   
   ```sh
   */5 * * * * /bin/sh /hadoop/data1/druid-0.12.3/druid-task-monitor.sh
   ```
   
   Mine is a check every 5 minutes to free up resources


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] qianmoQ commented on issue #8605: Failed to publish segments because of [java.lang.RuntimeException: Aborting transaction!].

Reply via email to