[
https://issues.apache.org/jira/browse/AMBARI-19906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15858408#comment-15858408
]
Olivér Szabó commented on AMBARI-19906:
---------------------------------------
Overall i like the idea, sounds as a great functionality.
But i have some concerns about the solution itself:
1. What Miklos mentioned, we cannot be sure about the size of the data (like it
can be too much for e.g. 1000 nodes)
2. As the steps are: get logs -> zip them -> upload to hdfs, That sounds like a
spark job flow (or something similar). Log Search should not be responsible to
manage something like that. In the proposal, a *.py script should do the steps,
but if we would implement something like that, Log Search should be monitor
what happens during the process, make sure when it finishes, or how the
progress is going. Log Search is stateless for now (on solr side, we only
storing log level filters in a collection, but we will move that to zookeeper
in Ambari 3.0), possibly we should include states in Log Search (like db
support), which would make Log Search to be HA in the future much-much harder.
(also i did not mention things like security on hdfs, if cluster is kerberized
etc.)
Actually we already have some cases where we need archiving logs. For example:
we can store audit logs for ranger in solr, we are storing data in solr for
about 1 week (as default), but as those are audit logs, it could be important
to store them for long term. There we need some mechanism to archive those in
hdfs (and optionally clear them from solr if its needed). To solving that we
will need a new component (call it like: infra-solr-manager), with that we
could schedule batch jobs to do what we want. This problem sounds a bit similar
to that.
So what i suggest is to create that new component, and through its rest api
logsearch could ask it to get the logs and the infra-solr-manager component
could manage things outside of the portal (transform data, archive them, make
it accessible if processes are done etc.). then we could keep logsearch portal
stateless. (so the main difference between this and your proposal is that the
logsearch.py should be a brand new component inside Ambari)
> Ambari Log Search – Single Click Log Collection and Download.
> -------------------------------------------------------------
>
> Key: AMBARI-19906
> URL: https://issues.apache.org/jira/browse/AMBARI-19906
> Project: Ambari
> Issue Type: Story
> Components: ambari-logsearch, logsearch
> Affects Versions: 2.4.0, 2.4.2
> Reporter: George Mathew
> Assignee: George Mathew
> Fix For: trunk
>
> Attachments: Proposal.pdf
>
>
> Ambari Log Search can be used to download log messages for further analysis.
> It lets you download log messages generated by a specific service in a
> specific node in the cluster.
> The challenge comes when troubleshooting several services spanning several
> nodes and you want to collect all the logs, there is no simple way to click
> and collect them at once. I am proposing a solution that would simplify the
> current actions needed to collect logs by clicking a button. The proposed
> download button will collect all the logs in the search criteria, organize
> them by service, place them on HDFS and offer a download link
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)