[
https://issues.apache.org/jira/browse/FLINK-2121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14566522#comment-14566522
]
ASF GitHub Bot commented on FLINK-2121:
---------------------------------------
GitHub user ggevay opened a pull request:
https://github.com/apache/flink/pull/752
[FLINK-2121] Fix the summation in FileInputFormat.addFilesInDir
Removed the length parameter, and made the length calculation start from 0
instead.
I also added a second inner dir to the test, so now it catches this problem
with any directory listing order.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/ggevay/flink dirSizeFix
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/752.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #752
----
commit 7fc86ce10ddc640126c7da8265403a815a30c2d2
Author: Gabor Gevay <[email protected]>
Date: 2015-05-31T11:27:15Z
[FLINK-2121] Fix the recursive summation in FileInputFormat.addFilesInDir
----
> FileInputFormat.addFilesInDir miscalculates total size
> ------------------------------------------------------
>
> Key: FLINK-2121
> URL: https://issues.apache.org/jira/browse/FLINK-2121
> Project: Flink
> Issue Type: Bug
> Components: Core
> Reporter: Gabor Gevay
> Assignee: Gabor Gevay
> Priority: Minor
>
> In FileInputFormat.addFilesInDir, the length variable should start from 0,
> because the return value is always used by adding it to the length (instead
> of just assigning). So with the current version, the length before the call
> will be seen twice in the result.
> mvn verify caught this for me now. The reason why this hasn't been seen yet,
> is because testGetStatisticsMultipleNestedFiles catches this only if it gets
> the listings of the outer directory in a certain order. Concretely, if the
> inner directory is seen before the other file in the outer directory, then
> length is 0 at that point, so the bug doesn't show. But if the other file is
> seen first, then its size is added twice to the total result.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)