[GitHub] flink pull request #4019: [FLINK-6776] [runtime] Use skip instead of seek fo...

StefanRRichter Tue, 13 Jun 2017 03:03:59 -0700

Github user StefanRRichter commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4019#discussion_r121630151
  
    --- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/fs/hdfs/HadoopDataInputStream.java
 ---
    @@ -31,11 +31,15 @@
      */
     public final class HadoopDataInputStream extends FSDataInputStream {
     
    +   /** Minimum amount of bytes to skip forward before we issue a seek 
instead of discarding read */
    +   private static final int MIN_SKIP_BYTES = 1024 * 1024;
    --- End diff --
    
    Right now, this is a purely "magic" number. The optimum should depend on 
the dfs and the underlying fs. For now, this number is chosen "big enough" to 
provide improvements for smaller seeks, and "small enough" to avoid 
disadvantages over real seeks. While the minimum should be the page size, a 
true optimum per system would be the amounts of bytes the can be consumed 
within seektime. Unfortunately, seektime is not constant and devices as well as 
dfs potentially also use read buffers and read-ahead. In the long run this 
value could become configurable, but for now I have simply chosen a 
conservative, relatively small value that should bring safe improvements for 
small skips in meta data, that would hurt the most.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #4019: [FLINK-6776] [runtime] Use skip instead of seek fo...

Reply via email to