[ 
https://issues.apache.org/jira/browse/HADOOP-19389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17913721#comment-17913721
 ] 

ASF GitHub Bot commented on HADOOP-19389:
-----------------------------------------

steveloughran commented on code in PR #7291:
URL: https://github.com/apache/hadoop/pull/7291#discussion_r1918456038


##########
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/shell/TestTextCommand.java:
##########
@@ -67,39 +75,194 @@ public void testDisplayForAvroFiles() throws Exception {
     assertEquals(expectedOutput, output);
   }
 
+  @Test
+  public void testEmptyAvroFile() throws Exception {

Review Comment:
   I'm going to suggest something different: add an abstract contract test for 
this command and implement it in file:, hdfs, s3a and azure -then in gcs once 
that is in.
   
   That way we can see how well/badly those stores cope with this



##########
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/shell/TestTextCommand.java:
##########
@@ -67,39 +75,194 @@ public void testDisplayForAvroFiles() throws Exception {
     assertEquals(expectedOutput, output);
   }
 
+  @Test
+  public void testEmptyAvroFile() throws Exception {
+    String output = readUsingTextCommand(AVRO_FILENAME,
+                                         generateEmptyAvroBinaryData());
+    assertTrue(output.isEmpty());
+  }
+
+  @Test(expected = NullPointerException.class)
+  public void testAvroFileInputStreamNullBuffer() throws Exception {
+    createFile(AVRO_FILENAME, generateWeatherAvroBinaryData());
+    URI uri = new URI(AVRO_FILENAME);
+    Configuration conf = new Configuration();
+    try (InputStream is = getInputStream(uri, conf)) {
+      is.read(null, 0, 10);
+    }
+  }
+
+  @Test(expected = IndexOutOfBoundsException.class)
+  public void testAvroFileInputStreamNegativePosition() throws Exception {
+    createFile(AVRO_FILENAME, generateWeatherAvroBinaryData());
+    URI uri = new URI(AVRO_FILENAME);
+    Configuration conf = new Configuration();
+    try (InputStream is = getInputStream(uri, conf)) {
+      is.read(new byte[10], -1, 10);
+    }
+  }
+
+  @Test(expected = IndexOutOfBoundsException.class)
+  public void testAvroFileInputStreamTooLong() throws Exception {
+    createFile(AVRO_FILENAME, generateWeatherAvroBinaryData());
+    URI uri = new URI(AVRO_FILENAME);
+    Configuration conf = new Configuration();
+    try (InputStream is = getInputStream(uri, conf)) {
+      is.read(new byte[10], 0, 11);
+    }
+  }
+
+  @Test
+  public void testAvroFileInputStreamZeroLengthRead() throws Exception {
+    createFile(AVRO_FILENAME, generateWeatherAvroBinaryData());
+    URI uri = new URI(AVRO_FILENAME);
+    Configuration conf = new Configuration();
+    try (InputStream is = getInputStream(uri, conf)) {
+      assertEquals(0, is.read(new byte[10], 0, 0));

Review Comment:
   prefer AssertJ for new code



##########
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/shell/TestTextCommand.java:
##########
@@ -47,13 +50,18 @@ public class TestTextCommand {
     new File(TEST_ROOT_DIR, "weather.avro").toURI().getPath();
   private static final String TEXT_FILENAME =
     new File(TEST_ROOT_DIR, "testtextfile.txt").toURI().getPath();
+  private static final String SEQUENCE_FILENAME =
+    new File(TEST_ROOT_DIR, "NonWritableSequenceFile").toURI().getPath();
 
   private static final String SEPARATOR = System.getProperty("line.separator");
 
+  @Rule

Review Comment:
   if you make this a contract test, there's a rule in the superclass along 
with some thread naming



##########
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/shell/TestTextCommand.java:
##########
@@ -67,39 +75,194 @@ public void testDisplayForAvroFiles() throws Exception {
     assertEquals(expectedOutput, output);
   }
 
+  @Test
+  public void testEmptyAvroFile() throws Exception {
+    String output = readUsingTextCommand(AVRO_FILENAME,
+                                         generateEmptyAvroBinaryData());
+    assertTrue(output.isEmpty());
+  }
+
+  @Test(expected = NullPointerException.class)
+  public void testAvroFileInputStreamNullBuffer() throws Exception {
+    createFile(AVRO_FILENAME, generateWeatherAvroBinaryData());
+    URI uri = new URI(AVRO_FILENAME);
+    Configuration conf = new Configuration();
+    try (InputStream is = getInputStream(uri, conf)) {
+      is.read(null, 0, 10);
+    }
+  }
+
+  @Test(expected = IndexOutOfBoundsException.class)
+  public void testAvroFileInputStreamNegativePosition() throws Exception {
+    createFile(AVRO_FILENAME, generateWeatherAvroBinaryData());
+    URI uri = new URI(AVRO_FILENAME);
+    Configuration conf = new Configuration();
+    try (InputStream is = getInputStream(uri, conf)) {
+      is.read(new byte[10], -1, 10);
+    }
+  }
+
+  @Test(expected = IndexOutOfBoundsException.class)
+  public void testAvroFileInputStreamTooLong() throws Exception {
+    createFile(AVRO_FILENAME, generateWeatherAvroBinaryData());
+    URI uri = new URI(AVRO_FILENAME);
+    Configuration conf = new Configuration();
+    try (InputStream is = getInputStream(uri, conf)) {
+      is.read(new byte[10], 0, 11);
+    }
+  }
+
+  @Test
+  public void testAvroFileInputStreamZeroLengthRead() throws Exception {
+    createFile(AVRO_FILENAME, generateWeatherAvroBinaryData());
+    URI uri = new URI(AVRO_FILENAME);

Review Comment:
   create a constant field for this as it is used everywhere



##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/Display.java:
##########
@@ -237,30 +237,69 @@ public TextRecordInputStream(FileStatus f) throws 
IOException {
     public int read() throws IOException {
       int ret;
       if (null == inbuf || -1 == (ret = inbuf.read())) {
-        key = r.next(key);
-        if (key == null) {
-          return -1;
+        if (!readNextFromSequenceFile()) {
+          ret = -1;
         } else {
-          val = r.getCurrentValue(val);
+          ret = inbuf.read();
         }
-        byte[] tmp = key.toString().getBytes(StandardCharsets.UTF_8);
-        outbuf.write(tmp, 0, tmp.length);
-        outbuf.write('\t');
-        tmp = val.toString().getBytes(StandardCharsets.UTF_8);
-        outbuf.write(tmp, 0, tmp.length);
-        outbuf.write('\n');
-        inbuf.reset(outbuf.getData(), outbuf.getLength());
-        outbuf.reset();
-        ret = inbuf.read();
       }
       return ret;
     }
 
+    @Override
+    public int read(byte[] dest, int destPos, int destLen) throws IOException {
+      if (dest == null) {
+        throw new NullPointerException();

Review Comment:
   nit: add text to the message.



##########
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/shell/TestTextCommand.java:
##########
@@ -67,39 +75,194 @@ public void testDisplayForAvroFiles() throws Exception {
     assertEquals(expectedOutput, output);
   }
 
+  @Test
+  public void testEmptyAvroFile() throws Exception {
+    String output = readUsingTextCommand(AVRO_FILENAME,
+                                         generateEmptyAvroBinaryData());
+    assertTrue(output.isEmpty());
+  }
+
+  @Test(expected = NullPointerException.class)
+  public void testAvroFileInputStreamNullBuffer() throws Exception {
+    createFile(AVRO_FILENAME, generateWeatherAvroBinaryData());
+    URI uri = new URI(AVRO_FILENAME);
+    Configuration conf = new Configuration();
+    try (InputStream is = getInputStream(uri, conf)) {
+      is.read(null, 0, 10);
+    }
+  }
+
+  @Test(expected = IndexOutOfBoundsException.class)
+  public void testAvroFileInputStreamNegativePosition() throws Exception {
+    createFile(AVRO_FILENAME, generateWeatherAvroBinaryData());
+    URI uri = new URI(AVRO_FILENAME);
+    Configuration conf = new Configuration();
+    try (InputStream is = getInputStream(uri, conf)) {
+      is.read(new byte[10], -1, 10);
+    }
+  }
+
+  @Test(expected = IndexOutOfBoundsException.class)
+  public void testAvroFileInputStreamTooLong() throws Exception {
+    createFile(AVRO_FILENAME, generateWeatherAvroBinaryData());
+    URI uri = new URI(AVRO_FILENAME);
+    Configuration conf = new Configuration();
+    try (InputStream is = getInputStream(uri, conf)) {
+      is.read(new byte[10], 0, 11);
+    }
+  }
+
+  @Test
+  public void testAvroFileInputStreamZeroLengthRead() throws Exception {
+    createFile(AVRO_FILENAME, generateWeatherAvroBinaryData());
+    URI uri = new URI(AVRO_FILENAME);
+    Configuration conf = new Configuration();
+    try (InputStream is = getInputStream(uri, conf)) {
+      assertEquals(0, is.read(new byte[10], 0, 0));
+    }
+  }
+
+  @Test
+  public void testAvroFileInputStreamConsistentEOF() throws Exception {
+    createFile(AVRO_FILENAME, generateWeatherAvroBinaryData());
+    URI uri = new URI(AVRO_FILENAME);
+    Configuration conf = new Configuration();
+    try (InputStream is = getInputStream(uri, conf)) {
+      inputStreamToString(is);
+      assertEquals(-1, is.read());
+      assertEquals(-1, is.read(new byte[10], 0, 10));
+    }
+  }
+
+  @Test
+  public void testAvroFileInputStreamSingleAndMultiByteReads() throws 
Exception {
+    createFile(AVRO_FILENAME, generateWeatherAvroBinaryData());
+    URI uri = new URI(AVRO_FILENAME);
+    Configuration conf = new Configuration();
+    try (InputStream is1 = getInputStream(uri, conf);
+        InputStream is2 = getInputStream(uri, conf)) {
+      String multiByteReads = inputStreamToString(is1);
+      String singleByteReads = inputStreamSingleByteReadsToString(is2);
+      assertEquals(multiByteReads, singleByteReads);
+    }
+  }
+
   /**
    * Tests that a zero-length file is displayed correctly.
    */
-  @Test (timeout = 30000)
-  public void testEmptyTextFil() throws Exception {
+  @Test
+  public void testEmptyTextFile() throws Exception {
     byte[] emptyContents = { };
     String output = readUsingTextCommand(TEXT_FILENAME, emptyContents);
-    assertTrue("".equals(output));
+    assertTrue(output.isEmpty());

Review Comment:
   use assertJ
   ```
   Assertions.assertThat(output)
     .describeAs("output")
     .isEmpty()
   ```
   



##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/Display.java:
##########
@@ -286,31 +326,98 @@ public AvroFileInputStream(FileStatus status) throws 
IOException {
       writer = new GenericDatumWriter<Object>(schema);
       output = new ByteArrayOutputStream();
       encoder = EncoderFactory.get().jsonEncoder(schema, output);
+      finalSeparator = 
System.getProperty("line.separator").getBytes(StandardCharsets.UTF_8);
     }
 
     /**
      * Read a single byte from the stream.
      */
     @Override
     public int read() throws IOException {
+      if (buffer == null) {
+        return -1;
+      }
+
       if (pos < buffer.length) {
         return buffer[pos++];
       }
+
       if (!fileReader.hasNext()) {
+        // Unset buffer to signal EOF on future calls.
+        buffer = null;
         return -1;
       }
+
       writer.write(fileReader.next(), encoder);
       encoder.flush();
+
       if (!fileReader.hasNext()) {
-        // Write a new line after the last Avro record.
-        output.write(System.getProperty("line.separator")
-                         .getBytes(StandardCharsets.UTF_8));
-        output.flush();
+        if (buffer.length > 0) {
+          // Write a new line after the last Avro record.
+          output.write(finalSeparator);
+          output.flush();
+        }
+      }
+
+      swapBuffer();
+      return read();
+    }
+
+    @Override
+    public int read(byte[] dest, int destPos, int destLen) throws IOException {
+      if (dest == null) {

Review Comment:
   bit of duplication -could be pulled out and used above too





> Optimize shell -text command I/O with multi-byte read.
> ------------------------------------------------------
>
>                 Key: HADOOP-19389
>                 URL: https://issues.apache.org/jira/browse/HADOOP-19389
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs, fs/azure, fs/gcs, fs/s3
>            Reporter: Chris Nauroth
>            Assignee: Chris Nauroth
>            Priority: Minor
>              Labels: pull-request-available
>
> {{hadoop fs -text}} reads Avro files and sequence files by internally 
> wrapping the stream in 
> [{{AvroFileInputStream}}|https://github.com/apache/hadoop/blob/rel/release-3.4.1/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/Display.java#L270]
>  or 
> [{{TextRecordInputStream}}|https://github.com/apache/hadoop/blob/rel/release-3.4.1/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/Display.java#L217].
>  These classes implement the required single-byte 
> [{{read()}}|https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/io/InputStream.html#read()],
>  but not the optional multi-byte buffered [{{read(byte[], int, 
> int)}}|https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/io/InputStream.html#read(byte%5B%5D,int,int)].
>  The default implementation in the JDK is a [loop over single-byte 
> read|https://github.com/openjdk/jdk11u-dev/blob/a47c72fad455bfdf9053cb8e94c99e73965ab50d/src/java.base/share/classes/java/io/InputStream.java#L280],
>  which causes sub-optimal I/O and method call overhead. We can optimize this 
> by overriding the multi-byte read method.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to