Re: [PR] HADOOP-19354. S3A: S3AInputStream to be created by factory under S3AStore [hadoop]

via GitHub Wed, 12 Feb 2025 07:05:57 -0800


steveloughran commented on code in PR #7214:
URL: https://github.com/apache/hadoop/pull/7214#discussion_r1952836839



##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java:
##########
@@ -1877,100 +1868,41 @@ private FSDataInputStream executeOpen(
     fileInformation.applyOptions(readContext);
     LOG.debug("Opening '{}'", readContext);
 
-    if (this.prefetchEnabled) {
-      Configuration configuration = getConf();
-      initLocalDirAllocatorIfNotInitialized(configuration);
-      return new FSDataInputStream(
-          new S3APrefetchingInputStream(
-              readContext.build(),
-              createObjectAttributes(path, fileStatus),
-              createInputStreamCallbacks(auditSpan),
-              inputStreamStats,
-              configuration,
-              directoryAllocator));
-    } else {
-      return new FSDataInputStream(
-          new S3AInputStream(
-              readContext.build(),
-              createObjectAttributes(path, fileStatus),
-              createInputStreamCallbacks(auditSpan),
-                  inputStreamStats,
-                  new SemaphoredDelegatingExecutor(
-                          boundedThreadPool,
-                          vectoredActiveRangeReads,
-                          true,
-                          inputStreamStats)));
-    }
-  }
-
-  /**
-   * Override point: create the callbacks for S3AInputStream.
-   * @return an implementation of the InputStreamCallbacks,
-   */
-  private S3AInputStream.InputStreamCallbacks createInputStreamCallbacks(
+    // what does the stream need
+    final StreamFactoryRequirements requirements =
+        getStore().factoryRequirements();
+
+    // calculate the permit count.
+    final int permitCount = requirements.streamThreads()
+        + requirements.vectoredIOContext().getVectoredActiveRangeReads();
+    // create an executor which is a subset of the
+    // bounded thread pool.
+    final SemaphoredDelegatingExecutor pool = new SemaphoredDelegatingExecutor(
+        boundedThreadPool,
+        permitCount,
+        true,
+        inputStreamStats);
+
+    // do not validate() the parameters as the store
+    // completes this.
+    ObjectReadParameters parameters = new ObjectReadParameters()

Review Comment:
   ooh, wire up to history is good. But does it have an expiry? can we turn it 
off? I ask as caches can be their own source of pain, and for other use cases 
they do cause problems.
   
   
   If you look at how parquet and iceberg open files, they do have the file 
status first, so we just need to wire up passing down that FileStatus, along 
with file type, and if known: footer location.
   
   parquet does now pass down its status, so the HEAD is skipped.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] HADOOP-19354. S3A: S3AInputStream to be created by factory under S3AStore [hadoop]

Reply via email to