Joe,

Thanks for the background on ListenUDP.

The use case I was thinking of was log aggregation... most logging
frameworks like logback, log4j, etc., have a UDP appender, and they also
generally have a json format/layout that conforms with the "logstash"
format. I was thinking it would be cool to be able to use NiFi as an
alternative to logstash, flume, and whatever other technologies are being
used to get logs into a central location. There are obviously other options
besides udp, but it seemed easy and well supported.

Maybe a property on the processor could control whether or not it buffered
datagrams vs producing a new FlowFile for each datagram?

-Bryan



On Fri, Apr 24, 2015 at 8:45 PM, Joe Witt <joe.w...@gmail.com> wrote:

> Mike Moser: Great thinking!
>
> Bryan
>
> Taken from listen udp docs:  "This processor listens for Datagram
> Packets on a given port and concatenates the contents of those packets
> together generating flow files roughly as often as the internal buffer
> fills up or until no more data is currently available."
>
> Quite honestly when this processor was originally built NiFi didn't
> have the ability to do the sort of fancy 'slab allocation' mechanism
> it supports today when generating a stream of flow files.  So we could
> probably pretty easily reimplement this to behave more like you were
> thinking it should.  But it is probably worth a bit of
> discussion/exploration to see what makes sense.  The case we built it
> for was data arriving in UDP packets and it was structured in such a
> way that simple binary concatenation was sufficient because the data
> was inherently demarcatable/stream processing friendly.  We could,
> however, implement it now such that each UDP datagram becomes a flow
> file.  But not sure that makes sense either.  This is sort of the
> inherent challenge of providing a raw socket listener.  If the 'thing'
> being exchanged is not clear then we're not sure what the boundary of
> a given flow file should be.
>
> I'll stop rambling: Please if you would describe the use case a bit
> more we can think about whether providing a mode of 'datagram =
> flowfile' makes sense.
>
> Thanks!
> Joe
>
> On Fri, Apr 24, 2015 at 7:44 PM, Bryan Bende <bbe...@gmail.com> wrote:
> > Thanks for the suggestions... looks like it is in fact coming out of
> > ListenUDP like that. I'll try to figure out if this is expected behavior,
> > or possibly something with how the messages are being sent.
> >
> > Sorry for the false alarm about MergeContent.
> >
> > On Fri, Apr 24, 2015 at 9:48 AM, Michael Moser <moser...@gmail.com>
> wrote:
> >
> >> At first glance, I would suspect ListenUDP is placing more than one UDP
> >> datagram into one flowfile.  It might be worth spending some time
> checking
> >> if that can happen.
> >>
> >> -- Mike
> >>
> >>
> >> On Thu, Apr 23, 2015 at 9:35 PM, Joe Witt <joe.w...@gmail.com> wrote:
> >>
> >> > Are you sure you're not sending the [ , ] over UDP as well ;-)
> >> >
> >> > Can you create a template of your flow and send it over?  Perhaps just
> >> > attach to a JIRA for this.  MergeContent is a powerful and useful
> >> > thing so if you're seeing funky behavior we want to sort it out
> >> > quickly.
> >> >
> >> > On Thu, Apr 23, 2015 at 8:47 PM, Bryan Bende <bbe...@gmail.com>
> wrote:
> >> > > I'm trying to use MergeContent to merge json documents. I have the
> >> > Header.
> >> > > Demarcator, and Footer properties pointing to files with [ , ]
> >> > > respectively. I left all other properties the same, and set Max
> Entries
> >> > to
> >> > > 5 and Max Bin Age to 10 seconds.
> >> > >
> >> > > I have a simple flow with ListenUDP -> MergeContent ->
> >> > PutSolrContentStream
> >> > > (from the pull request). If I send a bunch of json documents over
> UDP,
> >> > most
> >> > > of them will merge correctly, but I'll see a couple where the
> >> demarcator
> >> > > didn't get inserted between two json documents.
> >> > >
> >> > > Any thoughts as to why this would happen?
> >> > >
> >> > > I added a significant amount of logging to the
> >> getDescriptorFileContent()
> >> > > method in MergeContent to see if there was a reason why it would
> return
> >> > > null for the demarcator, but nothing obvious is really jumping out
> at
> >> me.
> >> >
> >>
>

Reply via email to