That makes sense, thanks Rob!

On Thu, Feb 5, 2015 at 6:06 PM, Rob Miller <[email protected]> wrote:

> On 02/05/2015 04:39 AM, Victor Castell wrote:
>
>> Yeah I know it's different because my input is LogStreamer and the
>> example in docs was for receiving protobuf by tcp.
>>
>> I want to understand.
>>
>> When my LogStreamer reads a message it passes to the decoder a Protocol
>> Buffer message with the with the log line in the message payload, that's
>> right?
>>
> Nope, this is the misunderstanding. When Logstreamer reads a text file, it
> passes to the decoder an instantiated Message struct with the log line in
> the message payload. Protocol buffers aren't involved at all. The only time
> it makes sense to use a ProtobufDecoder with a LogstreamerInput is if the
> file(s) you're loading contain binary, protobuf encoded Heka messages, such
> as those generated by Heka itself using a FileOutput with a
> ProtobufEncoder. This is a valid use case; in fact at Mozilla we do this
> often. Heka even ships with a command line utility called `heka-cat` (
> http://hekad.readthedocs.org/en/dev/developing/testing.html#heka-cat)
> which lets you browse and query the contents of such files.
>
> If the files you're loading are plain text log files, however, a
> ProtobufDecoder will have no idea what to do with them. It will fail on
> every message. And it will slow things down considerably.
>
>> Using the following config as my input decoder (this is what I actually
>> tried):
>>
>> [syslog-decoder]
>> subs = ['nginx-access-decoder', 'ProtobufDecoder']
>> cascade_strategy = "first-wins"
>> log_sub_errors = true
>>
>> [ProtobufDecoder]
>>
>> This should capture my nginx log lines and remove it from the decoding
>> "cascade" and pass all the rest to ProtobufDecoder that in turn doesn't
>> do nothing.
>>
>> Is this correct?
>>
> The first part is correct, any successfully parsed nginx log files won't
> make it through to the ProtobufDecoder. But any messages that fail the
> nginx parser will be given to the ProtobufDecoder, which will have no idea
> what to do with them.
>
>> And if it is, why is this so slow?
>>
> See above. :)
>
> -r
>
>
>> On Wed, Feb 4, 2015 at 8:08 PM, Rob Miller <[email protected]
>> <mailto:[email protected]>> wrote:
>>
>>     The config that you cargo-culted from the docs is meant for an
>>     entirely different use case. That's meant to handle cases where
>>     you're receiving protocol buffer encoded Heka messages, each of
>>     which contains an Nginx access log line as the message payload. This
>>     would be useful in a case where one Heka is loading the log files
>>     but instead of parsing them it's sending them along in protobuf
>>     format to another Heka that's doing the parsing. The config below
>>     would be used on the listener.
>>
>>     If you want to see the decoding errors all you need to do is change
>>     your `log_sub_errors` setting from false to true.
>>
>>     -r
>>
>>
>>
>>     On 02/04/2015 04:19 AM, Victor Castell wrote:
>>
>>         I managed to get a working config but I want to understand
>>         what's going on:
>>
>>         [syslog-decoder]
>>         type = "MultiDecoder"
>>         subs = ['nginx-access-decoder', 'rsyslog-decoder']
>>         cascade_strategy = "first-wins"
>>         log_sub_errors = false
>>
>>         In the nginx-access I'm deconding the corresponding access.log
>>         entries
>>         from my rsyslog and in the rsyslog decoder I'm capturing any other
>>         rsyslog entries and discarding them.
>>
>>         That works well but in my first attempt I tried with the config
>>         extracted from the documentation:
>>
>>         [shipped-nginx-decoder]
>>         type = "MultiDecoder"
>>         subs = ['ProtobufDecoder', 'nginx-access-decoder']
>>         cascade_strategy = "all"
>>         log_sub_errors = true
>>
>>         [ProtobufDecoder]
>>
>>         I would rather this config than the previous one, because it can
>>         log me
>>         the errors of my nginx decoding.
>>
>>         The problem is that when using the ProtobufDecoder the speed of
>>         decoding
>>         is really slow, and my nginx logs doesn't keep up with the current
>>         events in time, and it's always behind the current time.
>>
>>         This doesn't happen with the rsyslog-decoder config, it parses
>>         the logs
>>         really fast.
>>
>>         I thought it will be much faster using the internal
>>         ProtobufDecoder than
>>         a lua one but it's not the case.
>>
>>         What's the reason for this?
>>
>>
>>         On Fri, Jan 30, 2015 at 11:31 AM, Victor Castell
>>         <[email protected] <mailto:[email protected]>
>>         <mailto:victor@victorcastell.__com
>>         <mailto:[email protected]>>> wrote:
>>
>>              Didn't know of that! Life saver
>>
>>              Thanks!
>>
>>              El 30/1/2015 11:17, "Krzysztof Krzyżaniak"
>>         <[email protected] <mailto:[email protected]>
>>              <mailto:[email protected]
>>         <mailto:[email protected]>__>> escribió:
>>
>>                  W dni pią 30 sty, 2015 o 10∶34 użytkownik Victor Castell
>>                  <[email protected]
>>         <mailto:[email protected]>
>>         <mailto:victor@victorcastell.__com
>>         <mailto:[email protected]>>>
>>                  napisał:
>>         >         Hi!
>>         >
>>         >         I have a centralized rsyslog formatted logfile and I'm
>>         >         extracting nginx logs from there using heka and the
>> nginx
>>         >         access log decoder.
>>         >
>>         >         The problem is that the parser also logs every other log
>>         >         message out to heka.log.
>>         >
>>         >         The volume of non nginx logs mixed in my rsyslog log is
>> really
>>         >         huge so heka.log file is growing like crazy (I have
>>         >         logrotating before you ask)
>>         >
>>         >         Is there a way to conditionally/intentionally suppress
>> the
>>         >         parsing errors of a given decoder?
>>
>>                  You probably want to use MultiDecoder which split nginx
>>         logs
>>                  from the rest and use log_sub_errors = false in
>>         MultiDecoder
>>                  section.
>>
>>                     eloy
>>
>>
>>
>>
>>         --
>>         V
>>
>>
>>         _________________________________________________
>>         Heka mailing list
>>         [email protected] <mailto:[email protected]>
>>         https://mail.mozilla.org/__listinfo/heka
>>         <https://mail.mozilla.org/listinfo/heka>
>>
>>
>>
>>
>>
>> --
>> V
>>
>
>


-- 
V
_______________________________________________
Heka mailing list
[email protected]
https://mail.mozilla.org/listinfo/heka

Reply via email to