> On Ноя. 18, 2015, 7:10 п.п., Jonathan Hurley wrote:
> > ambari-agent/src/main/python/ambari_agent/RecoveryManager.py, line 840
> > <https://reviews.apache.org/r/40448/diff/2/?file=1131395#file1131395line840>
> >
> > Always use the temp directory configured in the agent, not hard coded
> > to /tmp.
def main(argv=None):
cmd_mgr = RecoveryManager('/tmp')
pass
That's main method and it is't called by the agent. It's called only during
deployment, agent configuration might not exist.
> On Ноя. 18, 2015, 7:10 п.п., Jonathan Hurley wrote:
> > ambari-agent/src/main/python/ambari_agent/FileCache.py, lines 42-43
> > <https://reviews.apache.org/r/40448/diff/2/?file=1131394#file1131394line42>
> >
> > I'm not so sure that these belong in the file cache - they are
> > directories that contain information pushed to the agents, not data that
> > the agents request.
It's convenient to have all the agent tmp directory paths defined in the same
place, but I can revert it if you insist.
> On Ноя. 18, 2015, 7:10 п.п., Jonathan Hurley wrote:
> > ambari-agent/src/main/python/ambari_agent/alerts/recovery_alert.py, line 82
> > <https://reviews.apache.org/r/40448/diff/2/?file=1131396#file1131396line82>
> >
> > Should `warned_threshold_reached` be a WARNING here? Or is it truly
> > CRITICAL?
warned_threshold_reached means that RecoveryManager won't make any more
attempts to recover component's state. That's requires close attention from
user, when maximum recovery attempts count reached
> On Ноя. 18, 2015, 7:10 п.п., Jonathan Hurley wrote:
> > ambari-agent/src/main/python/ambari_agent/alerts/recovery_alert.py, line 54
> > <https://reviews.apache.org/r/40448/diff/2/?file=1131396#file1131396line54>
> >
> > Grammar: The CRITICAL value of {1} must be greater than the WARNING
> > value of {2}
Fixed
> On Ноя. 18, 2015, 7:10 п.п., Jonathan Hurley wrote:
> > ambari-agent/src/main/python/ambari_agent/alerts/recovery_alert.py, line 92
> > <https://reviews.apache.org/r/40448/diff/2/?file=1131396#file1131396line92>
> >
> > lastReset is an object here, no?
> >
> > `datetime.datetime(1969, 12, 31, 19, 0)`
> >
> > Should we not be converting it to a human-readable string before
> > returning it to the base alert which generates the final string from the
> > parameters?
Yes, it's an object here, but .format() handles objects correctly. Alerts text
is somethinsg like "Metrics Collector has been auto-started 2 times since
2015-11-18 15:52:05.". I've attached screenshots for ok,warn,ctit to
https://issues.apache.org/jira/browse/AMBARI-13954
Added explicit str()
> On Ноя. 18, 2015, 7:10 п.п., Jonathan Hurley wrote:
> > ambari-agent/src/main/python/ambari_agent/alerts/recovery_alert.py, line 102
> > <https://reviews.apache.org/r/40448/diff/2/?file=1131396#file1131396line102>
> >
> > If there are no recovery operations, then ths will read "No recovery
> > operations executed for METRICS_COLLECTOR since 1969-12-31 19:00:00"
> >
> > That's a weird message - we should probably check to see if the value
> > of the datetime is 0 and then alter the statement to exclude the date.
Fixed
> On Ноя. 18, 2015, 7:10 п.п., Jonathan Hurley wrote:
> > ambari-server/src/main/java/org/apache/ambari/server/state/alert/RecoverySource.java,
> > lines 28-30
> > <https://reviews.apache.org/r/40448/diff/2/?file=1131404#file1131404line28>
> >
> > No need for the empty constructor.
I've added it intentionally as a best practice.
- Dmytro
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40448/#review107067
-----------------------------------------------------------
On Ноя. 18, 2015, 6:24 п.п., Dmytro Sen wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/40448/
> -----------------------------------------------------------
>
> (Updated Ноя. 18, 2015, 6:24 п.п.)
>
>
> Review request for Ambari, Jonathan Hurley, Sumit Mohanty, and Sid Wagle.
>
>
> Bugs: AMBARI-13954
> https://issues.apache.org/jira/browse/AMBARI-13954
>
>
> Repository: ambari
>
>
> Description
> -------
>
> - In 2.1.3 we have the watch dog script that will shutdown the API if HBase
> is unresponsive for sometime.
> - We also have the ability to auto-start per service / component
> - The two should work in conjunction for AMS
> - User needs to be alerted if Restarts are too frequent
>
> Alternative approach is for watch to act as a monitor and be responsible for
> restarting HBase.
> This should still be a alert hook but in that case the alert can be
> customized for AMS only.
>
> To turn on AMS auto-start append ambari.properties with
>
> recovery.type=AUTO_START
> recovery.enabled_components=METRICS_COLLECTOR
>
>
> Diffs
> -----
>
> ambari-agent/src/main/python/ambari_agent/AlertSchedulerHandler.py d3aab87
> ambari-agent/src/main/python/ambari_agent/Controller.py 520d78d
> ambari-agent/src/main/python/ambari_agent/FileCache.py 4869e51
> ambari-agent/src/main/python/ambari_agent/RecoveryManager.py cab81f5
> ambari-agent/src/main/python/ambari_agent/alerts/recovery_alert.py
> PRE-CREATION
> ambari-agent/src/test/python/ambari_agent/TestActionQueue.py df8278b
> ambari-agent/src/test/python/ambari_agent/TestAlertSchedulerHandler.py
> a08e4bc
> ambari-agent/src/test/python/ambari_agent/TestAlerts.py 1e6da64
> ambari-agent/src/test/python/ambari_agent/TestHeartbeat.py 1f3609d
> ambari-agent/src/test/python/ambari_agent/TestRecoveryManager.py e6115e3
> ambari-server/conf/unix/ambari.properties 7f0a464
>
> ambari-server/src/main/java/org/apache/ambari/server/state/alert/AlertDefinitionFactory.java
> 4bc25f8
>
> ambari-server/src/main/java/org/apache/ambari/server/state/alert/RecoverySource.java
> PRE-CREATION
>
> ambari-server/src/main/java/org/apache/ambari/server/state/alert/SourceType.java
> 6c1aa9a
>
> ambari-server/src/main/resources/common-services/AMBARI_METRICS/0.1.0/alerts.json
> 319427d
>
> Diff: https://reviews.apache.org/r/40448/diff/
>
>
> Testing
> -------
>
> Unit tests passed
>
>
> Thanks,
>
> Dmytro Sen
>
>