steveloughran commented on code in PR #3079:
URL: https://github.com/apache/parquet-java/pull/3079#discussion_r1865749087


##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/util/HadoopInputFile.java:
##########
@@ -70,9 +93,38 @@ public long getLength() {
     return stat.getLen();
   }
 
+  /**
+   * Open the file.
+   * <p>Uses {@code FileSystem.openFile()} so that
+   * the existing FileStatus can be passed down: saves a HEAD request on cloud 
storage.
+   * and ignored everywhere else.
+   *
+   * @return the input stream.
+   *
+   * @throws InterruptedIOException future was interrupted
+   * @throws IOException if something went wrong
+   * @throws RuntimeException any nested RTE thrown
+   */
   @Override
   public SeekableInputStream newStream() throws IOException {
-    return HadoopStreams.wrap(fs.open(stat.getPath()));
+    FSDataInputStream stream;
+    try {
+      // this method is async so that implementations may do async HEAD head
+      // requests. Not done in S3A/ABFS when a file status passed down (as is 
done here)
+      final CompletableFuture<FSDataInputStream> future = 
fs.openFile(stat.getPath())
+          .withFileStatus(stat)
+          .opt(OPENFILE_READ_POLICY_KEY, PARQUET_READ_POLICY)
+          .build();
+      stream = awaitFuture(future);
+    } catch (RuntimeException e) {
+      // S3A < 3.3.5 would raise illegal path exception if the openFile path 
didn't
+      // equal the path in the FileStatus; Hive virtual FS could create this 
condition.
+      // As the path to open is derived from stat.getPath(), this condition 
seems
+      // near-impossible to create -but is handled here for due diligence.
+      stream = fs.open(stat.getPath());

Review Comment:
   The main issue we hit is that 
   * filestatus includes full path
   * hive's wrapper fs has to create new fs instances with the wrapper fs path
   * and s3a was checking the source status matched that of the target path, 
raising IllegalArgumentException
   
   Hadoop 3.3.5+ only checks filename now
   
   How about
   * log at debug
   * add as suppressed.
   
   Exceptions are too important to lose after all.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to