[
https://issues.apache.org/jira/browse/AMBARI-17785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hemanth Yamijala updated AMBARI-17785:
--------------------------------------
Attachment: AMBARI-17785-1.patch
Attaching a new patch with some improvements. The full set of functionality
included in this patch
* Supports one at a time processing of log events in the {{OutputS3File}} case.
These are spooled locally and uploaded periodically to S3.
* Supports upload based on two criteria - file size threshold, and time based
threshold.
* Refactors code to achieve the above, while not duplicating any existing
functions - for e.g. the code path to upload files all at once is still
retained and uses the same helper classes like {{S3Uploader}} etc.
* Unit tests added for all new code.
There is still a lot left for this to be production quality - including error
handling, configuration & security etc. Will take these up separately. I am
still blocked by AMBARI-17788 to make this patch available, or upload it to
review board.
> Provide support for S3 as a first class destination for log events
> ------------------------------------------------------------------
>
> Key: AMBARI-17785
> URL: https://issues.apache.org/jira/browse/AMBARI-17785
> Project: Ambari
> Issue Type: Improvement
> Components: ambari-logsearch
> Reporter: Hemanth Yamijala
> Assignee: Hemanth Yamijala
> Attachments: AMBARI-17785-1.patch, AMBARI-17785.patch
>
>
> AMBARI-17045 added support for uploading Hadoop service logs from machines to
> S3. The intended usage there was as a one time trigger where, on-demand, the
> log files matching certain paths can be uploaded to a given S3 bucket and
> path.
> While useful, there are some use cases where we might need more than this one
> time activity, particularly when clusters are deployed on ephemeral machines
> such as cloud instances:
> * The machines running the logfeeder could be irrevocably lost and in that
> case we would not be able to retrieve any logs.
> * If we are copying logs at one time, that were generated over a long period
> of time, the time to copy all the logs at the end could extend cluster
> up-time and cost.
> It would be nice to have an ability to support S3 as another output
> destination in logsearch just like Kafka, Solr etc. This JIRA is to track
> work towards this enhancement.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)