[
https://issues.apache.org/jira/browse/HADOOP-19365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18022457#comment-18022457
]
ASF GitHub Bot commented on HADOOP-19365:
-----------------------------------------
steveloughran commented on code in PR #7723:
URL: https://github.com/apache/hadoop/pull/7723#discussion_r2363269891
##########
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AAnalyticsAcceleratorStreamReading.java:
##########
@@ -109,6 +110,12 @@ public void testConnectorFrameWorkIntegration() throws
Throwable {
verifyStatisticCounterValue(ioStats, STREAM_READ_ANALYTICS_OPENED, 1);
fs.close();
verifyStatisticCounterValue(fs.getIOStatistics(),
ANALYTICS_STREAM_FACTORY_CLOSED, 1);
+
+ // Expect 4 audited requests. One HEAD, and 3 GETs. The 3 GETs are because
the read policy is WHOLE_FILE,
+ // in which case, AAL will start prefetching till EoF on file open in 8MB
chunks. The file read here
+ // s3://noaa-cors-pds/raw/2023/017/ohfh/OHFH017d.23_.gz, has a size of
~21MB, resulting in 3 GETS:
+ // [0-8388607, 8388608-16777215, 16777216-21511173].
Review Comment:
slick. These are parallel GETs aren't they?
##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/audit/impl/LoggingAuditor.java:
##########
@@ -69,6 +70,8 @@
import static org.apache.hadoop.fs.s3a.commit.CommitUtils.extractJobID;
import static org.apache.hadoop.fs.s3a.impl.HeaderProcessing.HEADER_REFERRER;
import static
org.apache.hadoop.fs.s3a.statistics.impl.StatisticsFromAwsSdkImpl.mapErrorStatusCodeToStatisticName;
+import static
software.amazon.s3.analyticsaccelerator.request.Constants.OPERATION_NAME;
Review Comment:
again, need a copy in `org.apache.hadoop.fs.audit.AuditConstants`
##########
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AAnalyticsAcceleratorStreamReading.java:
##########
Review Comment:
what about assert in testMultiRowGroupParquet where you can assert that the
read took place after the open?
##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/audit/impl/LoggingAuditor.java:
##########
@@ -49,6 +50,7 @@
import org.apache.hadoop.fs.store.LogExactlyOnce;
import org.apache.hadoop.fs.store.audit.HttpReferrerAuditHeader;
import org.apache.hadoop.security.UserGroupInformation;
+import software.amazon.s3.analyticsaccelerator.request.RequestAttributes;
Review Comment:
even if it is a bug they fix, we can't bump the SDK right now. Is there some
clever way to use reflection to do this binding, through our DynMethods code?
##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/audit/AWSRequestAnalyzer.java:
##########
@@ -50,6 +51,8 @@
import static
org.apache.hadoop.fs.statistics.StoreStatisticNames.OBJECT_LIST_REQUEST;
import static
org.apache.hadoop.fs.statistics.StoreStatisticNames.OBJECT_PUT_REQUEST;
import static
org.apache.hadoop.fs.statistics.StoreStatisticNames.STORE_EXISTS_PROBE;
+import static
software.amazon.s3.analyticsaccelerator.request.Constants.OPERATION_NAME;
Review Comment:
need to copy this so if AAL isn't on the classpath things don't break
##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/audit/impl/LoggingAuditor.java:
##########
@@ -267,8 +269,9 @@ HttpReferrerAuditHeader getReferrer(AuditSpanS3A span) {
*/
private class LoggingAuditSpan extends AbstractAuditSpanImpl {
- private final HttpReferrerAuditHeader referrer;
+ private HttpReferrerAuditHeader referrer;
+ private final HttpReferrerAuditHeader.Builder headerBuilder;
Review Comment:
add a newline after and a javadoc before to explain what it is (an
incomplete header) and why it is there
##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/audit/impl/LoggingAuditor.java:
##########
@@ -384,12 +388,33 @@ public SdkHttpRequest
modifyHttpRequest(Context.ModifyHttpRequest context,
SdkHttpRequest httpRequest = context.httpRequest();
SdkRequest sdkRequest = context.request();
+ // If spanId and operationName are set in execution attributes, then use
these values,
+ // instead of the ones in the current span. This is useful when requests
are happening in dependencies such as
+ // the analytics accelerator library (AAL), where they cannot be
attached to the correct span. In which case, AAL
+ // will attach the current spanId and operationName via execution
attributes during it's request creation. These
+ // can then used to update the values in the logger and referrer header.
Without this overwriting, the operation
+ // name and corresponding span will be whichever is active on the thread
the request is getting executed on.
+ boolean isRequestAuditedOutsideCurrentSpan =
isRequestAuditedOutsideOfCurrentSpan(executionAttributes);
+
+ String spanId = isRequestAuditedOutsideCurrentSpan ?
+ executionAttributes.getAttribute(SPAN_ID) : getSpanId();
+
+ String operationName = isRequestAuditedOutsideCurrentSpan ?
+ executionAttributes.getAttribute(OPERATION_NAME) :
getOperationName();
+
+ if (isRequestAuditedOutsideCurrentSpan) {
+ this.headerBuilder.withSpanId(spanId);
+ this.headerBuilder.withOperationName(operationName);
+ this.referrer = this.headerBuilder.build();
+ }
+
// attach range for GetObject requests
attachRangeFromRequest(httpRequest, executionAttributes);
// for delete op, attach the number of files to delete
attachDeleteKeySizeAttribute(sdkRequest);
+
Review Comment:
cut
> S3A Analytics-Accelerator: Add audit header support
> ----------------------------------------------------
>
> Key: HADOOP-19365
> URL: https://issues.apache.org/jira/browse/HADOOP-19365
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Reporter: Ahmar Suhail
> Priority: Major
> Labels: pull-request-available
>
> S3A adds audit information as referrer headers, see
> [https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/auditing.md]
> for documentation on this.
> These are attached using execution interceptors, see ActiveAuditManagerS3A.
> createExecutionInterceptors().
>
> analyitcs-accelerator-s3 makes the GET requests now but does not update the
> referrer with GETs.
>
> See LoggingAuditor.attachRangeFromRequest() for how this is done.
>
> When using analytics-accelerator-s3, audit headers from S3A should be sent
> through.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]