mukund-thakur commented on a change in pull request #3501:
URL: https://github.com/apache/hadoop/pull/3501#discussion_r719416755



##########
File path: 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/JsonSerialization.java
##########
@@ -229,30 +235,44 @@ public T fromInstance(T instance) throws IOException {
 
   /**
    * Load from a Hadoop filesystem.
-   * There's a check for data availability after the file is open, by
-   * raising an EOFException if stream.available == 0.
-   * This allows for a meaningful exception without the round trip overhead
-   * of a getFileStatus call before opening the file. It may be brittle
-   * against an FS stream which doesn't return a value here, but the
-   * standard filesystems all do.
-   * JSON parsing and mapping problems
-   * are converted to IOEs.
    * @param fs filesystem
    * @param path path
    * @return a loaded object
-   * @throws IOException IO or JSON parse problems
+   * @throws PathIOException JSON parse problem
+   * @throws IOException IO problems
    */
   public T load(FileSystem fs, Path path) throws IOException {
-    try (FSDataInputStream dataInputStream = fs.open(path)) {
-      // throw an EOF exception if there is no data available.
-      if (dataInputStream.available() == 0) {
-        throw new EOFException("No data in " + path);
-      }
+    return load(fs, path, null);
+  }
+
+  /**
+   * Load from a Hadoop filesystem.
+   * If a file status is supplied, it's passed in to the openFile()
+   * call so that FS implementations can optimize their opening.
+   * @param fs filesystem
+   * @param path path
+   * @param status status of the file to open.
+   * @return a loaded object
+   * @throws PathIOException JSON parse problem
+   * @throws EOFException file status references an empty file
+   * @throws IOException IO problems
+   */
+  public T load(FileSystem fs, Path path, @Nullable FileStatus status)
+      throws IOException {
+
+    if (status != null && status.getLen() == 0) {
+      throw new EOFException("No data in " + path);

Review comment:
       What happens when we don't pass the FileStatus? Will it still throw EOF 
or just a normal IOE?

##########
File path: 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestJsonSerialization.java
##########
@@ -181,5 +197,23 @@ public void testFileSystemEmptyPath() throws Throwable {
     }
   }
 
+  /**
+   * 0 byte file through the load(path, status) API will fail with an

Review comment:
       Should we add a test with 0 byte file but not passing the fileStatus. 

##########
File path: 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/JsonSerialization.java
##########
@@ -229,30 +235,44 @@ public T fromInstance(T instance) throws IOException {
 
   /**
    * Load from a Hadoop filesystem.
-   * There's a check for data availability after the file is open, by
-   * raising an EOFException if stream.available == 0.
-   * This allows for a meaningful exception without the round trip overhead
-   * of a getFileStatus call before opening the file. It may be brittle
-   * against an FS stream which doesn't return a value here, but the
-   * standard filesystems all do.
-   * JSON parsing and mapping problems
-   * are converted to IOEs.
    * @param fs filesystem
    * @param path path
    * @return a loaded object
-   * @throws IOException IO or JSON parse problems
+   * @throws PathIOException JSON parse problem
+   * @throws IOException IO problems
    */
   public T load(FileSystem fs, Path path) throws IOException {
-    try (FSDataInputStream dataInputStream = fs.open(path)) {
-      // throw an EOF exception if there is no data available.
-      if (dataInputStream.available() == 0) {

Review comment:
       So we are delegating the clients to decide whether to throw EOF or not 
based on dataInputStream.available() method rather than throwing EOF ourselves. 
I think this is correct. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to