Todd Grayson created HDFS-4829:
----------------------------------

             Summary: Strange loss of data displayed in hadoop fs -tail command 
when data is separated by periods?
                 Key: HDFS-4829
                 URL: https://issues.apache.org/jira/browse/HDFS-4829
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: hdfs-client
    Affects Versions: 2.0.1-alpha, 2.0.0-alpha
         Environment: OS Centos 6.3 (on Intel Core2 Duo, VMware Player VM 
running under windows 7)
Testing on both 2.0.0-cdh4.1.1 and 2.0.1-cdh4.1.2
            Reporter: Todd Grayson
            Priority: Minor


Strange behavior of the hadoop fs -tail command - its default for output seems 
to be 9 lines of output vs 10 lines of output in the OS version of the command 
(minor issue).  The strange thing (bug behavior?) appears to drop the initial 
octect from an IP address when examining a file over HDFS.  

[training@localhost hands-on]$ hadoop fs -tail weblog/access_log
.190.174.142 - - [03/Dec/2011:13:28:08 -0800] "GET 
/assets/js/javascript_combined.js HTTP/1.1" 200 20404
10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] "GET /assets/img/home-logo.png 
HTTP/1.1" 200 3892
10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] "GET 
/images/filmmediablock/360/019.jpg HTTP/1.1" 200 74446
10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] "GET 
/images/filmmediablock/360/g_still_04.jpg HTTP/1.1" 200 761555
10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] "GET 
/images/filmmediablock/360/07082218.jpg HTTP/1.1" 200 154609
10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] "GET 
/images/filmpics/0000/2229/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 184976
10.190.174.142 - - [03/Dec/2011:13:28:11 -0800] "GET 
/images/filmmediablock/360/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 60117
10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] "GET 
/images/filmmediablock/360/Chacha.jpg HTTP/1.1" 200 109379
10.190.174.142 - - [03/Dec/2011:13:28:11 -0800] "GET 
/images/filmmediablock/360/GOEMON-NUKI-000159.jpg HTTP/1.1" 200 161657

*When looking at the original log data outside of HDFS with the os version of 
the tail command we see the following*

[training@localhost hands-on]$ hadoop fs -get weblog/access_log ./
[training@localhost hands-on]$ tail access_log 
10.190.174.142 - - [03/Dec/2011:13:28:06 -0800] "GET 
/images/filmpics/0000/2229/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 184976
10.190.174.142 - - [03/Dec/2011:13:28:08 -0800] "GET 
/assets/js/javascript_combined.js HTTP/1.1" 200 20404
10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] "GET /assets/img/home-logo.png 
HTTP/1.1" 200 3892
10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] "GET 
/images/filmmediablock/360/019.jpg HTTP/1.1" 200 74446
10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] "GET 
/images/filmmediablock/360/g_still_04.jpg HTTP/1.1" 200 761555
10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] "GET 
/images/filmmediablock/360/07082218.jpg HTTP/1.1" 200 154609
10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] "GET 
/images/filmpics/0000/2229/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 184976
10.190.174.142 - - [03/Dec/2011:13:28:11 -0800] "GET 
/images/filmmediablock/360/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 60117
10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] "GET 
/images/filmmediablock/360/Chacha.jpg HTTP/1.1" 200 109379
10.190.174.142 - - [03/Dec/2011:13:28:11 -0800] "GET 
/images/filmmediablock/360/GOEMON-NUKI-000159.jpg HTTP/1.1" 200 161657

When using non ip data seperated by periods, it gets even worse and even more 
data is masked? (same data subtituting names for IP octects).  Note we loose 
the first line well into the URI string? *

[training@localhost hands-on]$ hadoop fs -tail weblog/test_log
s/javascript_combined.js HTTP/1.1" 200 20404
larry.billy.will.amy - - [03/Dec/2011:13:28:09 -0800] "GET 
/assets/img/home-logo.png HTTP/1.1" 200 3892
larry.billy.will.amy - - [03/Dec/2011:13:28:09 -0800] "GET 
/images/filmmediablock/360/019.jpg HTTP/1.1" 200 74446
larry.billy.will.amy - - [03/Dec/2011:13:28:larry.-0800] "GET 
/images/filmmediablock/360/g_still_04.jpg HTTP/1.1" 200 761555
larry.billy.will.amy - - [03/Dec/2011:13:28:09 -0800] "GET 
/images/filmmediablock/360/07082218.jpg HTTP/1.1" 200 154609
larry.billy.will.amy - - [03/Dec/2011:13:28:larry.-0800] "GET 
/images/filmpics/0000/2229/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 184976
larry.billy.will.amy - - [03/Dec/2011:13:28:11 -0800] "GET 
/images/filmmediablock/360/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 60117
larry.billy.will.amy - - [03/Dec/2011:13:28:larry.-0800] "GET 
/images/filmmediablock/360/Chacha.jpg HTTP/1.1" 200 larry.379
larry.billy.will.amy - - [03/Dec/2011:13:28:11 -0800] "GET 
/images/filmmediablock/360/GOEMON-NUKI-000159.jpg HTTP/1.1" 200 161657

* and verifying what we are looking at in normal tail matches - note the first 
line is not represented in the hadoop fs -tail as its only grabbing 9 lines 
instead of 10... as I mentioned before. Align the two text based examples along 
the javascript_combined line. *

[training@localhost hands-on]$ tail test_log
larry.billy.will.amy - - [03/Dec/2011:13:28:06 -0800] "GET 
/images/filmpics/0000/2229/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 184976
larry.billy.will.amy - - [03/Dec/2011:13:28:08 -0800] "GET 
/assets/js/javascript_combined.js HTTP/1.1" 200 20404
larry.billy.will.amy - - [03/Dec/2011:13:28:09 -0800] "GET 
/assets/img/home-logo.png HTTP/1.1" 200 3892
larry.billy.will.amy - - [03/Dec/2011:13:28:09 -0800] "GET 
/images/filmmediablock/360/019.jpg HTTP/1.1" 200 74446
larry.billy.will.amy - - [03/Dec/2011:13:28:larry.-0800] "GET 
/images/filmmediablock/360/g_still_04.jpg HTTP/1.1" 200 761555
larry.billy.will.amy - - [03/Dec/2011:13:28:09 -0800] "GET 
/images/filmmediablock/360/07082218.jpg HTTP/1.1" 200 154609
larry.billy.will.amy - - [03/Dec/2011:13:28:larry.-0800] "GET 
/images/filmpics/0000/2229/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 184976
larry.billy.will.amy - - [03/Dec/2011:13:28:11 -0800] "GET 
/images/filmmediablock/360/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 60117
larry.billy.will.amy - - [03/Dec/2011:13:28:larry.-0800] "GET 
/images/filmmediablock/360/Chacha.jpg HTTP/1.1" 200 larry.379
larry.billy.will.amy - - [03/Dec/2011:13:28:11 -0800] "GET 
/images/filmmediablock/360/GOEMON-NUKI-000159.jpg HTTP/1.1" 200 161657

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to