Todd Grayson created HDFS-4829:
----------------------------------
Summary: Strange loss of data displayed in hadoop fs -tail command
when data is separated by periods?
Key: HDFS-4829
URL: https://issues.apache.org/jira/browse/HDFS-4829
Project: Hadoop HDFS
Issue Type: Bug
Components: hdfs-client
Affects Versions: 2.0.1-alpha, 2.0.0-alpha
Environment: OS Centos 6.3 (on Intel Core2 Duo, VMware Player VM
running under windows 7)
Testing on both 2.0.0-cdh4.1.1 and 2.0.1-cdh4.1.2
Reporter: Todd Grayson
Priority: Minor
Strange behavior of the hadoop fs -tail command - its default for output seems
to be 9 lines of output vs 10 lines of output in the OS version of the command
(minor issue). The strange thing (bug behavior?) appears to drop the initial
octect from an IP address when examining a file over HDFS.
[training@localhost hands-on]$ hadoop fs -tail weblog/access_log
.190.174.142 - - [03/Dec/2011:13:28:08 -0800] "GET
/assets/js/javascript_combined.js HTTP/1.1" 200 20404
10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] "GET /assets/img/home-logo.png
HTTP/1.1" 200 3892
10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] "GET
/images/filmmediablock/360/019.jpg HTTP/1.1" 200 74446
10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] "GET
/images/filmmediablock/360/g_still_04.jpg HTTP/1.1" 200 761555
10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] "GET
/images/filmmediablock/360/07082218.jpg HTTP/1.1" 200 154609
10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] "GET
/images/filmpics/0000/2229/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 184976
10.190.174.142 - - [03/Dec/2011:13:28:11 -0800] "GET
/images/filmmediablock/360/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 60117
10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] "GET
/images/filmmediablock/360/Chacha.jpg HTTP/1.1" 200 109379
10.190.174.142 - - [03/Dec/2011:13:28:11 -0800] "GET
/images/filmmediablock/360/GOEMON-NUKI-000159.jpg HTTP/1.1" 200 161657
*When looking at the original log data outside of HDFS with the os version of
the tail command we see the following*
[training@localhost hands-on]$ hadoop fs -get weblog/access_log ./
[training@localhost hands-on]$ tail access_log
10.190.174.142 - - [03/Dec/2011:13:28:06 -0800] "GET
/images/filmpics/0000/2229/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 184976
10.190.174.142 - - [03/Dec/2011:13:28:08 -0800] "GET
/assets/js/javascript_combined.js HTTP/1.1" 200 20404
10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] "GET /assets/img/home-logo.png
HTTP/1.1" 200 3892
10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] "GET
/images/filmmediablock/360/019.jpg HTTP/1.1" 200 74446
10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] "GET
/images/filmmediablock/360/g_still_04.jpg HTTP/1.1" 200 761555
10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] "GET
/images/filmmediablock/360/07082218.jpg HTTP/1.1" 200 154609
10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] "GET
/images/filmpics/0000/2229/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 184976
10.190.174.142 - - [03/Dec/2011:13:28:11 -0800] "GET
/images/filmmediablock/360/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 60117
10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] "GET
/images/filmmediablock/360/Chacha.jpg HTTP/1.1" 200 109379
10.190.174.142 - - [03/Dec/2011:13:28:11 -0800] "GET
/images/filmmediablock/360/GOEMON-NUKI-000159.jpg HTTP/1.1" 200 161657
When using non ip data seperated by periods, it gets even worse and even more
data is masked? (same data subtituting names for IP octects). Note we loose
the first line well into the URI string? *
[training@localhost hands-on]$ hadoop fs -tail weblog/test_log
s/javascript_combined.js HTTP/1.1" 200 20404
larry.billy.will.amy - - [03/Dec/2011:13:28:09 -0800] "GET
/assets/img/home-logo.png HTTP/1.1" 200 3892
larry.billy.will.amy - - [03/Dec/2011:13:28:09 -0800] "GET
/images/filmmediablock/360/019.jpg HTTP/1.1" 200 74446
larry.billy.will.amy - - [03/Dec/2011:13:28:larry.-0800] "GET
/images/filmmediablock/360/g_still_04.jpg HTTP/1.1" 200 761555
larry.billy.will.amy - - [03/Dec/2011:13:28:09 -0800] "GET
/images/filmmediablock/360/07082218.jpg HTTP/1.1" 200 154609
larry.billy.will.amy - - [03/Dec/2011:13:28:larry.-0800] "GET
/images/filmpics/0000/2229/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 184976
larry.billy.will.amy - - [03/Dec/2011:13:28:11 -0800] "GET
/images/filmmediablock/360/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 60117
larry.billy.will.amy - - [03/Dec/2011:13:28:larry.-0800] "GET
/images/filmmediablock/360/Chacha.jpg HTTP/1.1" 200 larry.379
larry.billy.will.amy - - [03/Dec/2011:13:28:11 -0800] "GET
/images/filmmediablock/360/GOEMON-NUKI-000159.jpg HTTP/1.1" 200 161657
* and verifying what we are looking at in normal tail matches - note the first
line is not represented in the hadoop fs -tail as its only grabbing 9 lines
instead of 10... as I mentioned before. Align the two text based examples along
the javascript_combined line. *
[training@localhost hands-on]$ tail test_log
larry.billy.will.amy - - [03/Dec/2011:13:28:06 -0800] "GET
/images/filmpics/0000/2229/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 184976
larry.billy.will.amy - - [03/Dec/2011:13:28:08 -0800] "GET
/assets/js/javascript_combined.js HTTP/1.1" 200 20404
larry.billy.will.amy - - [03/Dec/2011:13:28:09 -0800] "GET
/assets/img/home-logo.png HTTP/1.1" 200 3892
larry.billy.will.amy - - [03/Dec/2011:13:28:09 -0800] "GET
/images/filmmediablock/360/019.jpg HTTP/1.1" 200 74446
larry.billy.will.amy - - [03/Dec/2011:13:28:larry.-0800] "GET
/images/filmmediablock/360/g_still_04.jpg HTTP/1.1" 200 761555
larry.billy.will.amy - - [03/Dec/2011:13:28:09 -0800] "GET
/images/filmmediablock/360/07082218.jpg HTTP/1.1" 200 154609
larry.billy.will.amy - - [03/Dec/2011:13:28:larry.-0800] "GET
/images/filmpics/0000/2229/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 184976
larry.billy.will.amy - - [03/Dec/2011:13:28:11 -0800] "GET
/images/filmmediablock/360/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 60117
larry.billy.will.amy - - [03/Dec/2011:13:28:larry.-0800] "GET
/images/filmmediablock/360/Chacha.jpg HTTP/1.1" 200 larry.379
larry.billy.will.amy - - [03/Dec/2011:13:28:11 -0800] "GET
/images/filmmediablock/360/GOEMON-NUKI-000159.jpg HTTP/1.1" 200 161657
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira