Thanks, so is this a bug?  My issue is that i am storing the number of
"bytes" served from my apache log, and when its 0, i will end up storing 48
and skewing the reports.

Any thoughts?

Thanks for the find.

_AD

On Tue, Oct 4, 2011 at 2:56 PM, Mingjie Lai <[email protected]> wrote:

> AD.
>
> I noticed the issue before. It's actually not a regex problem, but the way
> flume printing byte array as string at collector side.
>
> You can also reproduce it by:
> # bin/flume node_nowatch -1 -s -n dump -c 'dump: tail("/tmp/integer") | {
> value("bb", "b") => console};
>
> Below is the piece of code (Attributes.java). It takes a bytes array whose
> length is 1, 4, or 8 and print them as int or long. In case of length 1, it
> only prints the byte value.
>
> ---------------
>      // this is a hack that prints in int, string and double format when
> there
>      // are 8 bytes.
>      // TODO (jon) this gets grosser and grosser. make a final decision on
> how
>      // these attributes are going to be
>      if (bytes.length == 8) {
>
>        return "(long)" + readLong(e, attr).toString() + "  (string) '"
>            + readString(e, attr) + "'" + " (double)"
>            + readDouble(e, attr).toString();
>      }
>
>      // this is a similar hack that prints in int and string format when
> there
>      // are 4 bytes.
>      if (bytes.length == 4) {
>        return readInt(e, attr).toString() + " '" + readString(e, attr) +
> "'";
>      }
>
>      if (bytes.length == 1) {
>        return "" + (((int) bytes[0]) & 0xff);
>      }
>
> ---------------
>
> -mingjie
>
>
> On 10/03/2011 07:40 PM, AD wrote:
>
>> Hello,
>>
>>  I noticed when trying to use regex to parse an integer from a file, a
>> number of 0 was populating the number 48 into the output on the flume
>> command line instead.  has anyone come across this before?  Example below:
>>
>> bash-3.2# cat /tmp/integer
>> 0
>>
>> bash-3.2# cat parse.int <http://parse.int>
>>
>> ./flume node_nowatch -1 -s -n dump -c 'dump: tail("/tmp/integer") | {
>> regexAll("^(\\d+)","mynum") => console }; '
>>
>> bash-3.2# ./parse.int <http://parse.int> 2>&1 | grep mynum
>>
>>
>> 2011-10-03 22:37:49,526 [main] INFO agent.FlumeNode: System property
>> sun.java.command=com.cloudera.**flume.agent.FlumeNode -1 -s -n dump -c
>> dump: tail("/tmp/integer") | { regexAll("^(\\d+)","mynum") => console };
>> 2011-10-03 22:37:49,966 [main] INFO agent.FlumeNode: Loading spec from
>> command line: 'dump: tail("/tmp/integer") | {
>> regexAll("^(\\d+)","mynum") => console }; '
>> lilmac.home [INFO Mon Oct 03 22:37:50 EDT 2011] { *mynum : 48* } {
>>
>> tailSrcFile : integer } 0
>>
>> Cheers,
>> AD
>>
>

Reply via email to