Hello,


It looks like the HDFS caching does not work well.

The cached log file is around 200MB. The hadoop cluster has 3 nodes, each
has 4GB memory.



-bash-4.1$ hdfs cacheadmin -addPool wptest1

Successfully added cache pool wptest1.



-bash-4.1$ /hadoop/hadoop-2.3.0/bin/hdfs cacheadmin -listPools

Found 1 result.

NAME     OWNER  GROUP  MODE            LIMIT  MAXTTL

wptest1  hdfs   hdfs   rwxr-xr-x   unlimited   never



-bash-4.1$ hdfs cacheadmin -addDirective -path hadoop003.log -pool wptest1

Added cache directive 1



-bash-4.1$  time /hadoop/hadoop-2.3.0/bin/hadoop fs -tail hadoop003.log

real    0m2.796s

user    0m4.263s

sys     0m0.203s



-bash-4.1$  time /hadoop/hadoop-2.3.0/bin/hadoop fs -tail hadoop003.log

real    0m3.050s

user    0m4.176s

sys     0m0.192s



*It is weird that the cache status shows 0 byte cached:*

-bash-4.1$ /hadoop/hadoop-2.3.0/bin/hdfs cacheadmin -listDirectives -stats
-path hadoop003.log -pool wptest1

Found 1 entry

ID POOL      REPL EXPIRY  PATH                       BYTES_NEEDED
*BYTES_CACHED*  FILES_NEEDED  FILES_CACHED

  1 wptest1      1 never   /user/hdfs/hadoop003.log
209715206             *0*             1             0



-bash-4.1$ file /hadoop/hadoop-2.3.0/lib/native/libhadoop.so.1.0.0

/hadoop/hadoop-2.3.0/lib/native/libhadoop.so.1.0.0: ELF 64-bit LSB shared
object, x86-64, version 1 (SYSV), dynamically linked, not stripped



I also tried the word count example with the same file. The execution time
is always 40 seconds. (The map/reduce job without cache is 42 seconds)

Is there anything wrong?

Thanks a lot

Reply via email to