[
https://issues.apache.org/jira/browse/HADOOP-18257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17769341#comment-17769341
]
ASF GitHub Bot commented on HADOOP-18257:
-----------------------------------------
mukund-thakur commented on PR #6000:
URL: https://github.com/apache/hadoop/pull/6000#issuecomment-1736255549
> Not currently. Is that something we would have to write the logic off of,
I'll have to check the code for it? Specifically for number of mappers maybe we
could have a threshold of number of files and then paginate them based on that.
Looks like this is completely serial now. But you can think of this as a
follow-up and maybe add support for that in the future once this gets used.
Just create a jira for now.
> Analyzing S3A Audit Logs
> -------------------------
>
> Key: HADOOP-18257
> URL: https://issues.apache.org/jira/browse/HADOOP-18257
> Project: Hadoop Common
> Issue Type: Task
> Components: fs/s3
> Reporter: Sravani Gadey
> Assignee: Mehakmeet Singh
> Priority: Major
> Labels: pull-request-available
>
> The main aim is to analyze S3A Audit logs to give better insights in Hive and
> Spark jobs.
> Steps involved are:
> * Merging audit log files containing huge number of audit logs collected
> from a job containing various S3 requests.
> * Parsing audit logs using regular expressions i.e., dividing them into key
> value pairs.
> * Converting the key value pairs into CSV file and AVRO file formats.
> * Querying on data which would give better insights for different jobs.
> * Visualizing the audit logs on Zeppelin or Jupyter notebook with graphs.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]