[ 
https://issues.apache.org/jira/browse/FLUME-828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13213447#comment-13213447
 ] 

Mike Percy commented on FLUME-828:
----------------------------------

Hi Brock,
I wonder if toString() should be overridden in SimpleEvent at all. I'm thinking 
it might make sense to factor the hex dumping code out into a static method of 
some utility class that takes an Event as a parameter. That way it would work 
with any Event implementation, not just SimpleEvent.

Regarding the LoggerSink, based on the example of hooking up a Sequence 
Generator Source to a LoggerSink, I think the intention was to simply stringify 
the bytes, assuming they were UTF8-encoded, and print them in a human-readable 
fashion. So we would assume that the body is the result of String.getBytes() 
and therefore decode via new String(event.getBody(), Charset.UTF_8). 
Unfortunately, that isn't very helpful in the more general case, so I can see 
the utility of the hex dump.

To be honest, I think this bug is an indication that we may be missing some 
important type information in the system that one might want to use to 
determine how to decode a given Event. So regardless of how we fix this bug it 
ends up being kind of a band-aid. :) What do you think?
                
> LoggerSink representation of the event's body isn't too useful
> --------------------------------------------------------------
>
>                 Key: FLUME-828
>                 URL: https://issues.apache.org/jira/browse/FLUME-828
>             Project: Flume
>          Issue Type: Improvement
>          Components: Sinks+Sources
>    Affects Versions: NG alpha 1
>            Reporter: Will McQueen
>            Assignee: Brock Noland
>             Fix For: v1.1.0
>
>         Attachments: FLUME-828-0.patch, FLUME-828-1.patch
>
>
> LoggerSink logs entries to console that looks like this:
>      Event: { headers:{} body:[B@5c1ae90c }
> ...where the body is just "getClass().getName() + "@" + 
> Integer.toHexString(hashCode())". The "getClass().getName() will always 
> resolve to [B.
> The issue seems to be how can we represent a SimpleEvent's payload as a 
> String, when the payload is some arbitrary byte array... the array's bytes 
> could represent encoded ascii chars, encoded UTF-8 chars, or binary data such 
> as an encrypted payload. If we default to ASCII translation for everything, 
> then the resulting String won't be useful for binary payloads since not all 
> 256 possible bytes have equivalent printable ASCII chars. Here's one idea:
> For each event body, we can print up to the first 16 bytes in hex format. If 
> there are >16 bytes, then print a "..." suffix at the end. The output would 
> look similar to what you get with unix "hexdump -C". Here's what a sample 
> output from LoggerSink would look like:
>      Event: { headers:{} body: 00000000 54 68 65 20 71 75 69 63 6B 20 62 72 
> 6F 77 6E 20 |The quick brown | ... }
> ...where both the hex and the ascii are displayed for the first 16 chars.
> Is it the most useful representation of the body? Probably not. Is it as 
> least more useful than printing "[B@" + Integer.toHexString(hashCode())"? I 
> think so.
> The commons io lib has a useful HexDump.dump cmd we can leverage.
> Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to