[ 
https://issues.apache.org/jira/browse/AMBARI-19906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15858408#comment-15858408
 ] 

Olivér Szabó commented on AMBARI-19906:
---------------------------------------

Overall i like the idea, sounds as a great functionality.

But i have some concerns about the solution itself:
1. What Miklos mentioned, we cannot be sure about the size of the data (like it 
can be too much for e.g. 1000 nodes)
2. As the steps are: get logs -> zip them -> upload to hdfs, That sounds like a 
spark job flow (or something similar). Log Search should not be responsible to 
manage something like that. In the proposal, a *.py script should do the steps, 
but if we would implement something like that, Log Search should be monitor 
what happens during the process, make sure when it finishes, or how the 
progress is going. Log Search is stateless for now (on solr side, we only 
storing log level filters in a collection, but we will move that to zookeeper 
in Ambari 3.0), possibly we should include states in Log Search (like db 
support), which would make Log Search to be HA in the future much-much harder. 
(also i did not mention things like security on hdfs, if cluster is kerberized 
etc.)

Actually we already have some cases where we need archiving logs. For example: 
we can store audit logs for ranger in solr, we are storing  data in solr for 
about 1 week (as default), but as those are audit logs, it could be important 
to store them for long term. There we need some mechanism to archive those in 
hdfs (and optionally clear them from solr if its needed). To solving that we 
will need a new component (call it like: infra-solr-manager), with that we 
could schedule batch jobs to do what we want. This problem sounds a bit similar 
to that.

So what i suggest is to create that new component, and through its rest api 
logsearch could ask it to get the logs and the infra-solr-manager component 
could manage things outside of the portal (transform data, archive them, make 
it accessible if processes are done etc.). then we could keep logsearch portal 
stateless. (so the main difference between this and your proposal is that the 
logsearch.py should be a brand new component inside Ambari)



> Ambari Log Search – Single Click Log Collection and Download.
> -------------------------------------------------------------
>
>                 Key: AMBARI-19906
>                 URL: https://issues.apache.org/jira/browse/AMBARI-19906
>             Project: Ambari
>          Issue Type: Story
>          Components: ambari-logsearch, logsearch
>    Affects Versions: 2.4.0, 2.4.2
>            Reporter: George Mathew
>            Assignee: George Mathew
>             Fix For: trunk
>
>         Attachments: Proposal.pdf
>
>
> Ambari Log Search can be used to download log messages for further analysis. 
> It lets you download log messages generated by a specific service in a 
> specific node in the cluster. 
> The challenge comes when troubleshooting several services spanning several 
> nodes and you want to collect all the logs, there is no simple way to click 
> and collect them at once. I am proposing a solution that would simplify the 
> current actions needed to collect logs by clicking a button. The proposed 
> download button will collect all the logs in the search criteria, organize 
> them by service, place them on HDFS and offer a download link



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to