[jira] [Commented] (HADOOP-18258) Merging of S3A Audit Logs

ASF GitHub Bot (Jira) Wed, 17 Aug 2022 23:30:11 -0700


    [ 
https://issues.apache.org/jira/browse/HADOOP-18258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17581180#comment-17581180
 ]


ASF GitHub Bot commented on HADOOP-18258:
-----------------------------------------

sravanigadey commented on code in PR #4383:
URL: https://github.com/apache/hadoop/pull/4383#discussion_r948701538


##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/audit/mapreduce/S3AAuditLogMergerAndParser.java:
##########
@@ -0,0 +1,237 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ *  or more contributor license agreements.  See the NOTICE file
+ *  distributed with this work for additional information
+ *  regarding copyright ownership.  The ASF licenses this file
+ *  to you under the Apache License, Version 2.0 (the
+ *  "License"); you may not use this file except in compliance
+ *  with the License.  You may obtain a copy of the License at
+ *
+ *       http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing, software
+ *  distributed under the License is distributed on an "AS IS" BASIS,
+ *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ *  See the License for the specific language governing permissions and
+ *  limitations under the License.
+ */
+
+package org.apache.hadoop.fs.s3a.audit.mapreduce;
+
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.regex.Matcher;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.avro.file.DataFileWriter;
+import org.apache.avro.io.DatumWriter;
+import org.apache.avro.specific.SpecificDatumWriter;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FSDataOutputStream;
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.LocatedFileStatus;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.RemoteIterator;
+import org.apache.hadoop.fs.s3a.audit.AvroDataRecord;
+import org.apache.hadoop.io.LongWritable;
+import org.apache.hadoop.io.Text;
+import org.apache.hadoop.mapred.LineRecordReader;
+
+import static org.apache.hadoop.fs.s3a.audit.S3LogParser.AWS_LOG_REGEXP_GROUPS;
+import static org.apache.hadoop.fs.s3a.audit.S3LogParser.LOG_ENTRY_PATTERN;
+
+/**
+ * Merge all the audit logs present in a directory of
+ * multiple audit log files into a single audit log file.
+ */
+public class S3AAuditLogMergerAndParser {
+
+  public static final int MAX_LINE_LENGTH = 32000;
+  private static final Logger LOG =
+      LoggerFactory.getLogger(S3AAuditLogMergerAndParser.class);
+
+  /**
+   * parseAuditLog method helps in parsing the audit log
+   * into key-value pairs using regular expressions.
+   *
+   * @param singleAuditLog this is single audit log from merged audit log file
+   * @return it returns a map i.e, auditLogMap which contains key-value pairs 
of a single audit log
+   */
+  public HashMap<String, String> parseAuditLog(String singleAuditLog) {
+    HashMap<String, String> auditLogMap = new HashMap<>();
+    if (singleAuditLog == null || singleAuditLog.length() == 0) {
+      LOG.debug(
+          "This is an empty string or null string, expected a valid string to 
parse");
+      return auditLogMap;
+    }
+    final Matcher matcher = LOG_ENTRY_PATTERN.matcher(singleAuditLog);
+    boolean patternMatching = matcher.matches();
+    if (patternMatching) {
+      for (String key : AWS_LOG_REGEXP_GROUPS) {
+        try {
+          final String value = matcher.group(key);
+          auditLogMap.put(key, value);
+        } catch (IllegalStateException e) {
+          LOG.debug(String.valueOf(e));

Review Comment:
   yes this catch block is required as matcher.group(key) throws 
IllegalStateException





> Merging of S3A Audit Logs
> -------------------------
>
>                 Key: HADOOP-18258
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18258
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>            Reporter: Sravani Gadey
>            Assignee: Sravani Gadey
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 12.5h
>  Remaining Estimate: 0h
>
> Merging audit log files containing huge number of audit logs collected from a 
> job like Hive or Spark job containing various S3 requests like list, head, 
> get and put requests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-18258) Merging of S3A Audit Logs

Reply via email to