[
https://issues.apache.org/jira/browse/HDFS-6608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14047504#comment-14047504
]
Tony Reix commented on HDFS-6608:
---------------------------------
See defect https://issues.apache.org/jira/browse/HDFS-6515 which was opened for
2.4.0 version.
> FsDatasetCache: hard-coded 4096 value in test is not appropriate for all HW
> ---------------------------------------------------------------------------
>
> Key: HDFS-6608
> URL: https://issues.apache.org/jira/browse/HDFS-6608
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: test
> Affects Versions: 3.0.0
> Environment: PPC64 (LE & BE, OpenJDK & IBM JVM, Ubuntu, RHEL 7 & RHEL
> 6.5)
> Reporter: Tony Reix
>
> The value 4096 is hard-coded in HDFS code (product and tests).
> It appears 171 times, including 8 times in product (not tests) code:
> hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs : 163
> hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs : 4
> hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/fs/http : 3
> hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/lib/wsrs : 1
> This value deals with different subjects: files, block size, page size, etc.
> 4096 (as block size and page size) is appropriate for many systems, but not
> for PPC64, for which it is 65536.
> Looking at HDFS product (not test) code, it seems (no 100% sure) that the
> code is OK (not using hard-coded page/block size). However someone should
> check this in depth.
> his.maxBytes = dataset.datanode.getDnConf().getMaxLockedMemory();
> However, at test level, the value 4096 is used in many places and it is very
> hard to understand if it depends on the HW architecture or not.
> About test TestFsDatasetCache#testPageRounder, the HW value is sometimes got
> from the system :
> private static final long PAGE_SIZE =
> NativeIO.POSIX.getCacheManipulator().getOperatingSystemPageSize();
> private static final long BLOCK_SIZE = PAGE_SIZE;
> but there are several places where 4096 is used whenever it should depend on
> the HW value.
> conf.setLong(DFSConfigKeys.DFS_DATANODE_MAX_LOCKED_MEMORY_KEY,
> CACHE_CAPACITY);
> With:
> // Most Linux installs allow a default of 64KB locked memory
> private static final long CACHE_CAPACITY = 64 * 1024
> However, for PPC64, this value should be much bigger.
> This TestFsDatasetCache#testPageRounder test is aimed to cache 5 pages of
> size 512. However, the page size is 65536 on PPC64 and 4064 on x86_64. Thus,
> the method in charge of reserving blocks in the HDFS cache will by 4096 bytes
> steps on x86_64 and 65536 bytes steps on PPC64 , whith a hard-coded limit :
> maxBytes = 65536 bytes
> 5 * 4096 = 20480 : OK
> 5 * 65536 = 327680 : KO : the test ends by TimeOut since the limit is
> overpassed at the very beginning and the test is still waiting.
> As a conclusion, there are several issues to fix:
> - instead of using many hard-coded values 4096, the (test mainly) code
> should use Java constants built by using HW values (like :
> NativeIO.POSIX.getCacheManipulator().getOperatingSystemPageSize() )
> - several constants must be used since 4096 deals with different subjects,
> included some that do not depend on the HW
> - the test must be improved for handling cases where the limit is
> over-passed at the very beginning
--
This message was sent by Atlassian JIRA
(v6.2#6252)