[ 
https://issues.apache.org/jira/browse/HDFS-7151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Rewoonenco updated HDFS-7151:
------------------------------------
    Affects Version/s: 3.0.0

> DFSInputStream method seek works incorrectly on huge HDFS block size
> --------------------------------------------------------------------
>
>                 Key: HDFS-7151
>                 URL: https://issues.apache.org/jira/browse/HDFS-7151
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode, fuse-dfs, hdfs-client
>    Affects Versions: 3.0.0, 2.3.0, 2.4.0, 2.5.0, 2.4.1, 2.5.1
>         Environment: dfs.block.size > 2Gb
>            Reporter: Andrew Rewoonenco
>            Priority: Critical
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Hadoop incorrectly works with block size more than 2Gb.
> The seek method of DFSInputStream class used int (32 bit signed) internal 
> value for seeking inside current block. This cause seek error when block size 
> is greater 2Gb.
> Found when using very large parquet files (10Gb) in Impala on Cloudera 
> cluster with block size 10Gb.
> Here is some log output:
> W0924 08:27:15.920017 40026 DFSInputStream.java:1397] BlockReader failed to 
> seek to 4390830898. Instead, it seeked to 95863602.
> W0924 08:27:15.921295 40024 DFSInputStream.java:1397] BlockReader failed to 
> seek to 5597521814. Instead, it seeked to 1302554518.
> BlockReader seek only 32-bit offsets (4390830898-95863602=4Gb as 
> 5597521814-1302554518).
> The code fragment producing that bug:
> int diff = (int)(targetPos - pos);
>       if (diff <= blockReader.available()) {
> Similar errors can exist in other parts of the HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to