[
https://issues.apache.org/jira/browse/UIMA-4684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14985835#comment-14985835
]
Lou DeGenaro commented on UIMA-4684:
------------------------------------
Shown during fix testing, here's an RM log file snippet where directory is over
quota. Notice the gap between 14:13:37 and 14:14:57. The RM should be logging
every 10 seconds. During this time the file system exceeded quota.
<<<<<>>>>>
02 Nov 2015 14:13:27,908 INFO RM.Scheduler- N/A schedule
------------------------------------------------
02 Nov 2015 14:13:27,908 INFO RM.JobManagerConverter- N/A createState
Schedule sent to Orchestrator
02 Nov 2015 14:13:27,909 INFO RM.JobManagerConverter- N/A createState
Reservation 2 15GB
Existing[1]: bluejws67-1.1^0
Additions[0]:
Removals[0]:
02 Nov 2015 14:13:27,917 INFO RM.ResourceManagerComponent- N/A runScheduler
-------- 2 ------- Scheduling loop returns --------------------
02 Nov 2015 14:13:28,457 INFO RM.ResourceManagerComponent- N/A NodeStability
Initial node stability reached: scheduler started.
02 Nov 2015 14:13:37,903 INFO RM.ResourceManagerComponent- N/A
onJobManagerStateUpdate -------> OR state arrives
02 Nov 2015 14:13:37,903 INFO RM.ResourceManagerComponent- N/A runScheduler
-------- 3 ------- Entering scheduling loop --------------------
02 Nov 2015 14:13:37,903 INFO RM.Scheduler- N/A nodeArrives Total arrivals: 13
02 Nov 2015 14:13:37,904 INFO RM.NodePool- N/A reset Nodepool: --default--
Maxorder set to 2
02 Nov 2015 14:13:37,904 INFO RM.Scheduler- N/A schedule Scheduling 0 new
jobs. Existing jobs: 1
02 Nov 2015 14:13:37,904 INFO RM.Scheduler- N/A schedule Run scheduler 0 with
top-level nodepool --default--
02 Nov 2015 14:13:37,904 INFO RM.RmJob- 2 getPrjCap System Cannot predict
cap: init_wait false || time_per_item 0.0
02 Nov 2015 14:13:37,904 INFO RM.RmJob- 2 initJobCap System O 1 Base cap: 1
Expected future cap: 2147483647 potential cap 1 actual cap 1
02 Nov 2015 14:13:37,904 INFO RM.NodepoolScheduler- N/A schedule Machine
occupancy before schedule
02 Nov 2015 14:13:37,905 INFO RM.NodePool- N/A queryMachines
================================== Query Machines Nodepool: --default--
=========================
02 Nov 2015 14:13:37,906 INFO RM.NodePool- N/A queryMachines
Name Blacklisted Order Active Shares Unused Shares Memory
(MB) Jobs
-------------------- ------------ ----- ------------- ------------- -----------
------ ...
bluejws67-4 false 2 0 2 30720
<none>[2]
bluejws67-3 false 2 0 2 30720
<none>[2]
bluejws67-1 false 1 1 0 15360
2
bluejws67-2 false 1 0 1 15360
<none>[1]
02 Nov 2015 14:13:37,906 INFO RM.NodePool- N/A queryMachines
================================== End Query Machines Nodepool: --default--
======================
02 Nov 2015 14:13:37,906 INFO RM.NodePool- N/A reset Nodepool: --d02 Nov 2015
14:14:57,862 INFO RM.ResourceManagerComponent- N/A runScheduler -------- 11
------- Entering scheduling loop --------------------
02 Nov 2015 14:14:57,863 INFO RM.Scheduler- N/A nodeArrives Total arrivals: 45
02 Nov 2015 14:14:57,863 INFO RM.NodePool- N/A reset Nodepool: --default--
Maxorder set to 2
02 Nov 2015 14:14:57,863 INFO RM.Scheduler- N/A schedule Scheduling 0 new
jobs. Existing jobs: 1
<<<<< >>>>>
Here is the corresponding RM console. Notice the console was still being
written during the time the file system quota was exceeded.
<<<<<>>>>>
02 Nov 2015 14:14:07,903 INFO RM.ResourceManagerComponent - J[N/A] T[48]
runScheduler -------- 6 ------- Scheduling loop returns --------------------
02 Nov 2015 14:14:17,848 INFO RM.ResourceManagerEventListener - J[N/A] T[28]
onOrchestratorStateUpdateEvent Event arrives
02 Nov 2015 14:14:17,885 INFO RM.ResourceManagerComponent - J[N/A] T[28]
onJobManagerStateUpdate -------> OR state arrives
java.io.IOException: Disk quota exceeded
at java.io.FileOutputStream.write(FileOutputStream.java:329)
at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:233)
...
Unable to log due to logging exception.
02 Nov 2015 14:14:17,891 INFO RM.ResourceManagerComponent - J[N/A] T[48]
runScheduler -------- 7 ------- Entering scheduling loop --------------------
02 Nov 2015 14:14:17,892 INFO RM.Scheduler - J[N/A] T[48] nodeArrives Total
arrivals: 29
<<<<<>>>>>
> DUCC daemons log-to-file should never give up
> ---------------------------------------------
>
> Key: UIMA-4684
> URL: https://issues.apache.org/jira/browse/UIMA-4684
> Project: UIMA
> Issue Type: Bug
> Components: DUCC
> Reporter: Lou DeGenaro
> Assignee: Lou DeGenaro
> Fix For: 2.1.0-Ducc
>
>
> Problem: When the common logging code fails to log to file, for example due
> to a quota violation, it sets a flag to never try logging again. The only
> way to resume logging is to recycle the daemon.
> Resolution: The logger should always attempt to log to file..never give up
> hope!
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)