[
https://issues.apache.org/jira/browse/AMBARI-25400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
amarnath reddy pappu updated AMBARI-25400:
------------------------------------------
Description:
If collector throws
http://collectorhost:port/{color:#032f62}/ws/v1/timeline/metrics/livenodes{color}
500 error then sink is unable to determine live/healthy collector.
sink will try to connect to another collector only if it is not
reachable/IOException to first collector. if any other response code other
than 200 then still it should consider as 1st Collector not reachable.
[https://github.com/apache/ambari/blob/release-2.7.4/ambari-metrics/ambari-metrics-common/src/main/java/org/apache/hadoop/metrics2/sink/timeline/AbstractTimelineMetricsSink.java#L629]
Possible fix
{code:java}
if (responseCode == 200) {
try (InputStream in = connection.getInputStream()) {
StringWriter writer = new StringWriter();
IOUtils.copy(in, writer);
try {
collectors = gson.fromJson(writer.toString(), new
TypeToken<List<String>>(){}.getType());
} catch (JsonSyntaxException jse) {
// Swallow this at the behest of still trying to POST
LOG.debug("Exception deserializing the json data on live " +
"collector nodes.", jse);
}
}
} else if (responseCode == 500){
String warnMsg = "Unable to connect to collector to find live nodes,
Internal server error";
throw new MetricCollectorUnavailableException(warnMsg);
}
{code}
was:
If collector throws
http://collectorhost:port/{color:#032f62}/ws/v1/timeline/metrics/livenodes{color}
500 error then sink is unable to determine live/healthy collector.
sink will try to connect to another collector only if it is not
reachable/IOException to first collector. if any other response code other
than 200 then still it should consider as 1st Collector not reachable.
[https://github.com/apache/ambari/blob/release-2.7.4/ambari-metrics/ambari-metrics-common/src/main/java/org/apache/hadoop/metrics2/sink/timeline/AbstractTimelineMetricsSink.java#L629]
{code:java}
if (responseCode == 200) {
try (InputStream in = connection.getInputStream()) {
StringWriter writer = new StringWriter();
IOUtils.copy(in, writer);
try {
collectors = gson.fromJson(writer.toString(), new
TypeToken<List<String>>(){}.getType());
} catch (JsonSyntaxException jse) {
// Swallow this at the behest of still trying to POST
LOG.debug("Exception deserializing the json data on live " +
"collector nodes.", jse);
}
}
} else if (responseCode == 500){
String warnMsg = "Unable to connect to collector to find live nodes,
Internal server error";
throw new MetricCollectorUnavailableException(warnMsg);
}
{code}
> Issue while determining live collector in case of HA
> ----------------------------------------------------
>
> Key: AMBARI-25400
> URL: https://issues.apache.org/jira/browse/AMBARI-25400
> Project: Ambari
> Issue Type: Bug
> Components: ambari-metrics
> Affects Versions: 2.6.2, 2.7.4
> Reporter: amarnath reddy pappu
> Priority: Major
>
> If collector throws
> http://collectorhost:port/{color:#032f62}/ws/v1/timeline/metrics/livenodes{color}
> 500 error then sink is unable to determine live/healthy collector.
>
> sink will try to connect to another collector only if it is not
> reachable/IOException to first collector. if any other response code other
> than 200 then still it should consider as 1st Collector not reachable.
>
> [https://github.com/apache/ambari/blob/release-2.7.4/ambari-metrics/ambari-metrics-common/src/main/java/org/apache/hadoop/metrics2/sink/timeline/AbstractTimelineMetricsSink.java#L629]
>
> Possible fix
>
> {code:java}
> if (responseCode == 200) {
> try (InputStream in = connection.getInputStream()) {
> StringWriter writer = new StringWriter();
> IOUtils.copy(in, writer);
> try {
> collectors = gson.fromJson(writer.toString(), new
> TypeToken<List<String>>(){}.getType());
> } catch (JsonSyntaxException jse) {
> // Swallow this at the behest of still trying to POST
> LOG.debug("Exception deserializing the json data on live " +
> "collector nodes.", jse);
> }
> }
> } else if (responseCode == 500){
> String warnMsg = "Unable to connect to collector to find live nodes,
> Internal server error";
> throw new MetricCollectorUnavailableException(warnMsg);
> }
> {code}
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)