qianmoQ commented on issue #8605: Failed to publish segments because of [java.lang.RuntimeException: Aborting transaction!]. URL: https://github.com/apache/druid/issues/8605#issuecomment-570538079 > [coordinator-overlord.log](https://github.com/apache/druid/files/4018386/coordinator-overlord.log) > I encountered this problem, too. As a freshman on Druid, I've no idea how to solve it. > The datas in kafka may be ingested again when next new task is running. I can query those old records before that segment fails, but after the failure those records are gone. 这个问题的出现原因在于,当前运行task节点内存|CPU资源不足导致,并且druid无法去释放已经完成的task,这些task还一直常驻内存中,导致资源不足,唯有手动去释放这些物理资源,可以参考以下脚本进行资源监控释放: ```sh # /bin/bash DRUID_RUNNING_TASKS_PIDS=`ps -ef f|grep '\_ java -cp conf/druid/_common:conf/druid/middleManager:lib'|grep -v grep|awk '{print $2}'` CURRENT_TIMESTAMP=`date +%s` for pid in $DRUID_RUNNING_TASKS_PIDS do CUEERNT_START_TIME=`ps -p $pid -o lstart|tail -1` TEMP=`date -d "$CUEERNT_START_TIME" +%s` TIME_DIFF=$(($CURRENT_TIMESTAMP - $TEMP)) if [[ $TIME_DIFF -gt 3600 ]]; then echo 'current PID $pid,start time $CUEERNT_START_TIME, timestamp$TEMP' kill -9 $pid fi done ``` 请修改脚本中的3600大于您部署task的时间的,为保证数据服务,尽量是task的2倍,将本脚本监控加入系统的crontab中,比如: ```sh */5 * * * * /bin/sh /hadoop/data1/druid-0.12.3/druid-task-monitor.sh ``` Mine is a check every 5 minutes to free up resources
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
