[
https://issues.apache.org/jira/browse/MAPREDUCE-5853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13977383#comment-13977383
]
Jason Dere commented on MAPREDUCE-5853:
---------------------------------------
It looks like FileSystem's implementation of getContentSummary() really just
uses getFileStatus()/listStatus(). If we get rid of the overridden version of
getContentSummary() in FilterFileSystem and just fall back to the FileSystem
implementation, would this work correctly, since FilterFileSystem does have
overridden versions of getFileStatus()/listStatus()?
> ChecksumFileSystem.getContentSummary() including contents for crc files
> ------------------------------------------------------------------------
>
> Key: MAPREDUCE-5853
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5853
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Reporter: Jason Dere
>
> Trying to track down some differences in Hive statistics between
> hadoop-1/hadoop-2. It looks like although ChecksumFileSystem.listStatus()
> filters out CRC files, getContentSummary() falls back to using the
> FilterFileSystem.getContentSummary() implementation, which calls
> fs.getContentSummary(). The underlying fs may not have the same filters as
> the ChecksumFileSystem and so the CRC files can get included in the content
> summary.
--
This message was sent by Atlassian JIRA
(v6.2#6252)