[ 
https://issues.apache.org/jira/browse/HDFS-15811?focusedWorklogId=550979&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-550979
 ]

ASF GitHub Bot logged work on HDFS-15811:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 10/Feb/21 21:03
            Start Date: 10/Feb/21 21:03
    Worklog Time Spent: 10m 
      Work Description: zehaoc2 commented on a change in pull request #2670:
URL: https://github.com/apache/hadoop/pull/2670#discussion_r574074367



##########
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
##########
@@ -8667,6 +8676,9 @@ public void logAuditEvent(boolean succeeded, String 
userName,
         }
         sb.append("\t").append("proto=")
             .append(Server.getProtocol());
+        if (cmd.equals(CMD_COMPLETE_FILE) && status != null) {
+          sb.append("\t").append("fileSize=").append(status.getLen());

Review comment:
       There are some discussions around compatibility issues with audit logs. 
@daryn-sharp @kihwal Could you please share your thoughts?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 550979)
    Time Spent: 1h  (was: 50m)

> completeFile should log final file size
> ---------------------------------------
>
>                 Key: HDFS-15811
>                 URL: https://issues.apache.org/jira/browse/HDFS-15811
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Zehao Chen
>            Assignee: Zehao Chen
>            Priority: Minor
>              Labels: pull-request-available
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> Jobs, particularly hive queries by non-headless users, can create an 
> excessive number of files (many hundreds of thousands). A single user's query 
> can generate a sustained burst of 60-80% of all creates for tens of minutes 
> or more and impact overall cluster performance. Adding the file size to the 
> logline allows us to identify excessive tiny or large files.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to