Tony Reix created HDFS-6608:
-------------------------------

             Summary: FsDatasetCache: hard-coded 4096 value in test is not 
appropriate for all HW
                 Key: HDFS-6608
                 URL: https://issues.apache.org/jira/browse/HDFS-6608
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: test
    Affects Versions: 3.0.0
         Environment: PPC64 (LE & BE, OpenJDK & IBM JVM, Ubuntu, RHEL 7 & RHEL 
6.5)
            Reporter: Tony Reix


The value 4096 is hard-coded in HDFS code (product and tests).
It appears 171 times, including 8 times in product (not tests) code:
hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs : 163
hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs : 4
hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/fs/http : 3
hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/lib/wsrs : 1

This value deals with different subjects: files, block size, page size, etc.
4096 (as block size and page size) is appropriate for many systems, but not for 
PPC64, for which it is 65536.

Looking at HDFS product (not test) code, it seems (no 100% sure) that the code 
is OK (not using hard-coded page/block size). However someone should check this 
in depth.

his.maxBytes = dataset.datanode.getDnConf().getMaxLockedMemory();

However, at test level, the value 4096 is used in many places and it is very 
hard to understand if it depends on the HW architecture or not.

About test TestFsDatasetCache#testPageRounder, the HW value is sometimes got 
from the system :
 private static final long PAGE_SIZE = 
NativeIO.POSIX.getCacheManipulator().getOperatingSystemPageSize();
private static final long BLOCK_SIZE = PAGE_SIZE;
but there are several places where 4096 is used whenever it should depend on 
the HW value.

conf.setLong(DFSConfigKeys.DFS_DATANODE_MAX_LOCKED_MEMORY_KEY, CACHE_CAPACITY);
 With:
// Most Linux installs allow a default of 64KB locked memory
private static final long CACHE_CAPACITY = 64 * 1024
However, for PPC64, this value should be much bigger.

This TestFsDatasetCache#testPageRounder test is aimed to cache 5 pages of size 
512. However, the page size is 65536 on PPC64 and 4064 on x86_64. Thus, the 
method in charge of reserving blocks in the HDFS cache will by 4096 bytes steps 
on x86_64 and 65536 bytes steps on PPC64 , whith a hard-coded limit : maxBytes 
= 65536 bytes

5 * 4096 = 20480 : OK
5 * 65536 = 327680 : KO : the test ends by TimeOut since the limit is 
overpassed at the very beginning and the test is still waiting.

As a conclusion, there are several issues to fix:
 - instead of using many hard-coded values 4096, the (test mainly) code should 
use Java constants built by using HW values (like : 
NativeIO.POSIX.getCacheManipulator().getOperatingSystemPageSize() )
 - several constants must be used since 4096 deals with different subjects, 
included some that do not depend on the HW
 - the test must be improved for handling cases where the limit is over-passed 
at the very beginning



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to