[jira] [Commented] (UIMA-4684) DUCC daemons log-to-file should never give up

Lou DeGenaro (JIRA) Mon, 02 Nov 2015 11:25:57 -0800

    [ 
https://issues.apache.org/jira/browse/UIMA-4684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14985835#comment-14985835
 ]


Lou DeGenaro commented on UIMA-4684:
------------------------------------

Shown during fix testing, here's an RM log file snippet where directory is over 
quota.  Notice the gap between 14:13:37 and 14:14:57.  The RM should be logging 
every 10 seconds.  During this time the file system exceeded quota.

<<<<<>>>>>
02 Nov 2015 14:13:27,908  INFO RM.Scheduler- N/A schedule  
------------------------------------------------
02 Nov 2015 14:13:27,908  INFO RM.JobManagerConverter- N/A createState  
Schedule sent to Orchestrator
02 Nov 2015 14:13:27,909  INFO RM.JobManagerConverter- N/A createState
Reservation 2 15GB
        Existing[1]: bluejws67-1.1^0
        Additions[0]:
        Removals[0]:

02 Nov 2015 14:13:27,917  INFO RM.ResourceManagerComponent- N/A runScheduler  
-------- 2 ------- Scheduling loop returns  --------------------
02 Nov 2015 14:13:28,457  INFO RM.ResourceManagerComponent- N/A NodeStability  
Initial node stability reached: scheduler started.
02 Nov 2015 14:13:37,903  INFO RM.ResourceManagerComponent- N/A 
onJobManagerStateUpdate  -------> OR state arrives
02 Nov 2015 14:13:37,903  INFO RM.ResourceManagerComponent- N/A runScheduler  
-------- 3 ------- Entering scheduling loop --------------------
02 Nov 2015 14:13:37,903  INFO RM.Scheduler- N/A nodeArrives  Total arrivals: 13
02 Nov 2015 14:13:37,904  INFO RM.NodePool- N/A reset  Nodepool: --default-- 
Maxorder set to 2
02 Nov 2015 14:13:37,904  INFO RM.Scheduler- N/A schedule  Scheduling 0  new 
jobs.  Existing jobs: 1
02 Nov 2015 14:13:37,904  INFO RM.Scheduler- N/A schedule  Run scheduler 0 with 
top-level nodepool --default--
02 Nov 2015 14:13:37,904  INFO RM.RmJob- 2 getPrjCap  System Cannot predict 
cap: init_wait false || time_per_item 0.0
02 Nov 2015 14:13:37,904  INFO RM.RmJob- 2 initJobCap  System O 1 Base cap: 1 
Expected future cap: 2147483647 potential cap 1 actual cap 1
02 Nov 2015 14:13:37,904  INFO RM.NodepoolScheduler- N/A schedule  Machine 
occupancy before schedule
02 Nov 2015 14:13:37,905  INFO RM.NodePool- N/A queryMachines  
================================== Query Machines Nodepool: --default-- 
=========================
02 Nov 2015 14:13:37,906  INFO RM.NodePool- N/A queryMachines
                 Name  Blacklisted Order Active Shares Unused Shares Memory 
(MB) Jobs
-------------------- ------------ ----- ------------- ------------- ----------- 
------ ...
         bluejws67-4        false     2             0             2       30720 
<none>[2]
         bluejws67-3        false     2             0             2       30720 
<none>[2]
         bluejws67-1        false     1             1             0       15360 
2
         bluejws67-2        false     1             0             1       15360 
<none>[1]

02 Nov 2015 14:13:37,906  INFO RM.NodePool- N/A queryMachines  
================================== End Query Machines Nodepool: --default-- 
======================
02 Nov 2015 14:13:37,906  INFO RM.NodePool- N/A reset  Nodepool: --d02 Nov 2015 
14:14:57,862  INFO RM.ResourceManagerComponent- N/A runScheduler  -------- 11 
------- Entering scheduling loop --------------------
02 Nov 2015 14:14:57,863  INFO RM.Scheduler- N/A nodeArrives  Total arrivals: 45
02 Nov 2015 14:14:57,863  INFO RM.NodePool- N/A reset  Nodepool: --default-- 
Maxorder set to 2
02 Nov 2015 14:14:57,863  INFO RM.Scheduler- N/A schedule  Scheduling 0  new 
jobs.  Existing jobs: 1
<<<<< >>>>>

Here is the corresponding RM console.  Notice the console was still being 
written during the time the file system quota was exceeded.

<<<<<>>>>>
02 Nov 2015 14:14:07,903  INFO RM.ResourceManagerComponent - J[N/A] T[48] 
runScheduler  -------- 6 ------- Scheduling loop returns  --------------------
02 Nov 2015 14:14:17,848  INFO RM.ResourceManagerEventListener - J[N/A] T[28] 
onOrchestratorStateUpdateEvent  Event arrives
02 Nov 2015 14:14:17,885  INFO RM.ResourceManagerComponent - J[N/A] T[28] 
onJobManagerStateUpdate  -------> OR state arrives
java.io.IOException: Disk quota exceeded
        at java.io.FileOutputStream.write(FileOutputStream.java:329)
        at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:233)
       ...
Unable to log due to logging exception.
02 Nov 2015 14:14:17,891  INFO RM.ResourceManagerComponent - J[N/A] T[48] 
runScheduler  -------- 7 ------- Entering scheduling loop --------------------
02 Nov 2015 14:14:17,892  INFO RM.Scheduler - J[N/A] T[48] nodeArrives  Total 
arrivals: 29
<<<<<>>>>>

> DUCC daemons log-to-file should never give up
> ---------------------------------------------
>
>                 Key: UIMA-4684
>                 URL: https://issues.apache.org/jira/browse/UIMA-4684
>             Project: UIMA
>          Issue Type: Bug
>          Components: DUCC
>            Reporter: Lou DeGenaro
>            Assignee: Lou DeGenaro
>             Fix For: 2.1.0-Ducc
>
>
> Problem: When the common logging code fails to log to file, for example due 
> to a quota violation, it sets a flag to never try logging again.  The only 
> way to resume logging is to recycle the daemon.
> Resolution: The logger should always attempt to log to file..never give up 
> hope!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (UIMA-4684) DUCC daemons log-to-file should never give up

Reply via email to