Todd Grayson created HDFS-4829: ---------------------------------- Summary: Strange loss of data displayed in hadoop fs -tail command when data is separated by periods? Key: HDFS-4829 URL: https://issues.apache.org/jira/browse/HDFS-4829 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.0.1-alpha, 2.0.0-alpha Environment: OS Centos 6.3 (on Intel Core2 Duo, VMware Player VM running under windows 7) Testing on both 2.0.0-cdh4.1.1 and 2.0.1-cdh4.1.2 Reporter: Todd Grayson Priority: Minor
Strange behavior of the hadoop fs -tail command - its default for output seems to be 9 lines of output vs 10 lines of output in the OS version of the command (minor issue). The strange thing (bug behavior?) appears to drop the initial octect from an IP address when examining a file over HDFS. [training@localhost hands-on]$ hadoop fs -tail weblog/access_log .190.174.142 - - [03/Dec/2011:13:28:08 -0800] "GET /assets/js/javascript_combined.js HTTP/1.1" 200 20404 10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] "GET /assets/img/home-logo.png HTTP/1.1" 200 3892 10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] "GET /images/filmmediablock/360/019.jpg HTTP/1.1" 200 74446 10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] "GET /images/filmmediablock/360/g_still_04.jpg HTTP/1.1" 200 761555 10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] "GET /images/filmmediablock/360/07082218.jpg HTTP/1.1" 200 154609 10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] "GET /images/filmpics/0000/2229/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 184976 10.190.174.142 - - [03/Dec/2011:13:28:11 -0800] "GET /images/filmmediablock/360/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 60117 10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] "GET /images/filmmediablock/360/Chacha.jpg HTTP/1.1" 200 109379 10.190.174.142 - - [03/Dec/2011:13:28:11 -0800] "GET /images/filmmediablock/360/GOEMON-NUKI-000159.jpg HTTP/1.1" 200 161657 *When looking at the original log data outside of HDFS with the os version of the tail command we see the following* [training@localhost hands-on]$ hadoop fs -get weblog/access_log ./ [training@localhost hands-on]$ tail access_log 10.190.174.142 - - [03/Dec/2011:13:28:06 -0800] "GET /images/filmpics/0000/2229/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 184976 10.190.174.142 - - [03/Dec/2011:13:28:08 -0800] "GET /assets/js/javascript_combined.js HTTP/1.1" 200 20404 10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] "GET /assets/img/home-logo.png HTTP/1.1" 200 3892 10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] "GET /images/filmmediablock/360/019.jpg HTTP/1.1" 200 74446 10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] "GET /images/filmmediablock/360/g_still_04.jpg HTTP/1.1" 200 761555 10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] "GET /images/filmmediablock/360/07082218.jpg HTTP/1.1" 200 154609 10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] "GET /images/filmpics/0000/2229/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 184976 10.190.174.142 - - [03/Dec/2011:13:28:11 -0800] "GET /images/filmmediablock/360/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 60117 10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] "GET /images/filmmediablock/360/Chacha.jpg HTTP/1.1" 200 109379 10.190.174.142 - - [03/Dec/2011:13:28:11 -0800] "GET /images/filmmediablock/360/GOEMON-NUKI-000159.jpg HTTP/1.1" 200 161657 When using non ip data seperated by periods, it gets even worse and even more data is masked? (same data subtituting names for IP octects). Note we loose the first line well into the URI string? * [training@localhost hands-on]$ hadoop fs -tail weblog/test_log s/javascript_combined.js HTTP/1.1" 200 20404 larry.billy.will.amy - - [03/Dec/2011:13:28:09 -0800] "GET /assets/img/home-logo.png HTTP/1.1" 200 3892 larry.billy.will.amy - - [03/Dec/2011:13:28:09 -0800] "GET /images/filmmediablock/360/019.jpg HTTP/1.1" 200 74446 larry.billy.will.amy - - [03/Dec/2011:13:28:larry.-0800] "GET /images/filmmediablock/360/g_still_04.jpg HTTP/1.1" 200 761555 larry.billy.will.amy - - [03/Dec/2011:13:28:09 -0800] "GET /images/filmmediablock/360/07082218.jpg HTTP/1.1" 200 154609 larry.billy.will.amy - - [03/Dec/2011:13:28:larry.-0800] "GET /images/filmpics/0000/2229/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 184976 larry.billy.will.amy - - [03/Dec/2011:13:28:11 -0800] "GET /images/filmmediablock/360/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 60117 larry.billy.will.amy - - [03/Dec/2011:13:28:larry.-0800] "GET /images/filmmediablock/360/Chacha.jpg HTTP/1.1" 200 larry.379 larry.billy.will.amy - - [03/Dec/2011:13:28:11 -0800] "GET /images/filmmediablock/360/GOEMON-NUKI-000159.jpg HTTP/1.1" 200 161657 * and verifying what we are looking at in normal tail matches - note the first line is not represented in the hadoop fs -tail as its only grabbing 9 lines instead of 10... as I mentioned before. Align the two text based examples along the javascript_combined line. * [training@localhost hands-on]$ tail test_log larry.billy.will.amy - - [03/Dec/2011:13:28:06 -0800] "GET /images/filmpics/0000/2229/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 184976 larry.billy.will.amy - - [03/Dec/2011:13:28:08 -0800] "GET /assets/js/javascript_combined.js HTTP/1.1" 200 20404 larry.billy.will.amy - - [03/Dec/2011:13:28:09 -0800] "GET /assets/img/home-logo.png HTTP/1.1" 200 3892 larry.billy.will.amy - - [03/Dec/2011:13:28:09 -0800] "GET /images/filmmediablock/360/019.jpg HTTP/1.1" 200 74446 larry.billy.will.amy - - [03/Dec/2011:13:28:larry.-0800] "GET /images/filmmediablock/360/g_still_04.jpg HTTP/1.1" 200 761555 larry.billy.will.amy - - [03/Dec/2011:13:28:09 -0800] "GET /images/filmmediablock/360/07082218.jpg HTTP/1.1" 200 154609 larry.billy.will.amy - - [03/Dec/2011:13:28:larry.-0800] "GET /images/filmpics/0000/2229/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 184976 larry.billy.will.amy - - [03/Dec/2011:13:28:11 -0800] "GET /images/filmmediablock/360/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 60117 larry.billy.will.amy - - [03/Dec/2011:13:28:larry.-0800] "GET /images/filmmediablock/360/Chacha.jpg HTTP/1.1" 200 larry.379 larry.billy.will.amy - - [03/Dec/2011:13:28:11 -0800] "GET /images/filmmediablock/360/GOEMON-NUKI-000159.jpg HTTP/1.1" 200 161657 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira