[ 
https://issues.apache.org/jira/browse/HADOOP-19365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18022457#comment-18022457
 ] 

ASF GitHub Bot commented on HADOOP-19365:
-----------------------------------------

steveloughran commented on code in PR #7723:
URL: https://github.com/apache/hadoop/pull/7723#discussion_r2363269891


##########
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AAnalyticsAcceleratorStreamReading.java:
##########
@@ -109,6 +110,12 @@ public void testConnectorFrameWorkIntegration() throws 
Throwable {
     verifyStatisticCounterValue(ioStats, STREAM_READ_ANALYTICS_OPENED, 1);
     fs.close();
     verifyStatisticCounterValue(fs.getIOStatistics(), 
ANALYTICS_STREAM_FACTORY_CLOSED, 1);
+
+    // Expect 4 audited requests. One HEAD, and 3 GETs. The 3 GETs are because 
the read policy is WHOLE_FILE,
+    // in which case, AAL will start prefetching till EoF on file open in 8MB 
chunks. The file read here
+    // s3://noaa-cors-pds/raw/2023/017/ohfh/OHFH017d.23_.gz, has a size of 
~21MB, resulting in 3 GETS:
+    // [0-8388607, 8388608-16777215, 16777216-21511173].

Review Comment:
   slick. These are parallel GETs aren't they?



##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/audit/impl/LoggingAuditor.java:
##########
@@ -69,6 +70,8 @@
 import static org.apache.hadoop.fs.s3a.commit.CommitUtils.extractJobID;
 import static org.apache.hadoop.fs.s3a.impl.HeaderProcessing.HEADER_REFERRER;
 import static 
org.apache.hadoop.fs.s3a.statistics.impl.StatisticsFromAwsSdkImpl.mapErrorStatusCodeToStatisticName;
+import static 
software.amazon.s3.analyticsaccelerator.request.Constants.OPERATION_NAME;

Review Comment:
   again, need a copy in `org.apache.hadoop.fs.audit.AuditConstants`



##########
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AAnalyticsAcceleratorStreamReading.java:
##########


Review Comment:
   what about assert in testMultiRowGroupParquet where you can assert that the 
read took place after the open?



##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/audit/impl/LoggingAuditor.java:
##########
@@ -49,6 +50,7 @@
 import org.apache.hadoop.fs.store.LogExactlyOnce;
 import org.apache.hadoop.fs.store.audit.HttpReferrerAuditHeader;
 import org.apache.hadoop.security.UserGroupInformation;
+import software.amazon.s3.analyticsaccelerator.request.RequestAttributes;

Review Comment:
   even if it is a bug they fix, we can't bump the SDK right now. Is there some 
clever way to use reflection to do this binding, through our DynMethods code?



##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/audit/AWSRequestAnalyzer.java:
##########
@@ -50,6 +51,8 @@
 import static 
org.apache.hadoop.fs.statistics.StoreStatisticNames.OBJECT_LIST_REQUEST;
 import static 
org.apache.hadoop.fs.statistics.StoreStatisticNames.OBJECT_PUT_REQUEST;
 import static 
org.apache.hadoop.fs.statistics.StoreStatisticNames.STORE_EXISTS_PROBE;
+import static 
software.amazon.s3.analyticsaccelerator.request.Constants.OPERATION_NAME;

Review Comment:
   need to copy this so if AAL isn't on the classpath things don't break



##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/audit/impl/LoggingAuditor.java:
##########
@@ -267,8 +269,9 @@ HttpReferrerAuditHeader getReferrer(AuditSpanS3A span) {
    */
   private class LoggingAuditSpan extends AbstractAuditSpanImpl {
 
-    private final HttpReferrerAuditHeader referrer;
+    private HttpReferrerAuditHeader referrer;
 
+    private final HttpReferrerAuditHeader.Builder headerBuilder;

Review Comment:
   add a newline after and a javadoc before to explain what it is (an 
incomplete header) and why it is there



##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/audit/impl/LoggingAuditor.java:
##########
@@ -384,12 +388,33 @@ public SdkHttpRequest 
modifyHttpRequest(Context.ModifyHttpRequest context,
       SdkHttpRequest httpRequest = context.httpRequest();
       SdkRequest sdkRequest = context.request();
 
+      // If spanId and operationName are set in execution attributes, then use 
these values,
+      // instead of the ones in the current span. This is useful when requests 
are happening in dependencies such as
+      // the analytics accelerator library (AAL), where they cannot be 
attached to the correct span. In which case, AAL
+      // will attach the current spanId and operationName via execution 
attributes during it's request creation. These
+      // can then used to update the values in the logger and referrer header. 
Without this overwriting, the operation
+      // name and corresponding span will be whichever is active on the thread 
the request is getting executed on.
+      boolean isRequestAuditedOutsideCurrentSpan = 
isRequestAuditedOutsideOfCurrentSpan(executionAttributes);
+
+      String spanId = isRequestAuditedOutsideCurrentSpan ?
+              executionAttributes.getAttribute(SPAN_ID) : getSpanId();
+
+      String operationName = isRequestAuditedOutsideCurrentSpan ?
+              executionAttributes.getAttribute(OPERATION_NAME) : 
getOperationName();
+
+      if (isRequestAuditedOutsideCurrentSpan) {
+        this.headerBuilder.withSpanId(spanId);
+        this.headerBuilder.withOperationName(operationName);
+        this.referrer = this.headerBuilder.build();
+      }
+
       // attach range for GetObject requests
       attachRangeFromRequest(httpRequest, executionAttributes);
 
       // for delete op, attach the number of files to delete
       attachDeleteKeySizeAttribute(sdkRequest);
 
+

Review Comment:
   cut





> S3A Analytics-Accelerator: Add audit header support 
> ----------------------------------------------------
>
>                 Key: HADOOP-19365
>                 URL: https://issues.apache.org/jira/browse/HADOOP-19365
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>            Reporter: Ahmar Suhail
>            Priority: Major
>              Labels: pull-request-available
>
> S3A adds audit information as referrer headers, see 
> [https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/auditing.md]
>  for documentation on this. 
> These are attached using execution interceptors, see ActiveAuditManagerS3A.
> createExecutionInterceptors(). 
>  
> analyitcs-accelerator-s3 makes the GET requests now but does not update the 
> referrer with GETs. 
>  
> See LoggingAuditor.attachRangeFromRequest() for how this is done. 
>  
> When using analytics-accelerator-s3, audit headers from S3A should be sent 
> through. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to