Joe, Thanks for the background on ListenUDP.
The use case I was thinking of was log aggregation... most logging frameworks like logback, log4j, etc., have a UDP appender, and they also generally have a json format/layout that conforms with the "logstash" format. I was thinking it would be cool to be able to use NiFi as an alternative to logstash, flume, and whatever other technologies are being used to get logs into a central location. There are obviously other options besides udp, but it seemed easy and well supported. Maybe a property on the processor could control whether or not it buffered datagrams vs producing a new FlowFile for each datagram? -Bryan On Fri, Apr 24, 2015 at 8:45 PM, Joe Witt <joe.w...@gmail.com> wrote: > Mike Moser: Great thinking! > > Bryan > > Taken from listen udp docs: "This processor listens for Datagram > Packets on a given port and concatenates the contents of those packets > together generating flow files roughly as often as the internal buffer > fills up or until no more data is currently available." > > Quite honestly when this processor was originally built NiFi didn't > have the ability to do the sort of fancy 'slab allocation' mechanism > it supports today when generating a stream of flow files. So we could > probably pretty easily reimplement this to behave more like you were > thinking it should. But it is probably worth a bit of > discussion/exploration to see what makes sense. The case we built it > for was data arriving in UDP packets and it was structured in such a > way that simple binary concatenation was sufficient because the data > was inherently demarcatable/stream processing friendly. We could, > however, implement it now such that each UDP datagram becomes a flow > file. But not sure that makes sense either. This is sort of the > inherent challenge of providing a raw socket listener. If the 'thing' > being exchanged is not clear then we're not sure what the boundary of > a given flow file should be. > > I'll stop rambling: Please if you would describe the use case a bit > more we can think about whether providing a mode of 'datagram = > flowfile' makes sense. > > Thanks! > Joe > > On Fri, Apr 24, 2015 at 7:44 PM, Bryan Bende <bbe...@gmail.com> wrote: > > Thanks for the suggestions... looks like it is in fact coming out of > > ListenUDP like that. I'll try to figure out if this is expected behavior, > > or possibly something with how the messages are being sent. > > > > Sorry for the false alarm about MergeContent. > > > > On Fri, Apr 24, 2015 at 9:48 AM, Michael Moser <moser...@gmail.com> > wrote: > > > >> At first glance, I would suspect ListenUDP is placing more than one UDP > >> datagram into one flowfile. It might be worth spending some time > checking > >> if that can happen. > >> > >> -- Mike > >> > >> > >> On Thu, Apr 23, 2015 at 9:35 PM, Joe Witt <joe.w...@gmail.com> wrote: > >> > >> > Are you sure you're not sending the [ , ] over UDP as well ;-) > >> > > >> > Can you create a template of your flow and send it over? Perhaps just > >> > attach to a JIRA for this. MergeContent is a powerful and useful > >> > thing so if you're seeing funky behavior we want to sort it out > >> > quickly. > >> > > >> > On Thu, Apr 23, 2015 at 8:47 PM, Bryan Bende <bbe...@gmail.com> > wrote: > >> > > I'm trying to use MergeContent to merge json documents. I have the > >> > Header. > >> > > Demarcator, and Footer properties pointing to files with [ , ] > >> > > respectively. I left all other properties the same, and set Max > Entries > >> > to > >> > > 5 and Max Bin Age to 10 seconds. > >> > > > >> > > I have a simple flow with ListenUDP -> MergeContent -> > >> > PutSolrContentStream > >> > > (from the pull request). If I send a bunch of json documents over > UDP, > >> > most > >> > > of them will merge correctly, but I'll see a couple where the > >> demarcator > >> > > didn't get inserted between two json documents. > >> > > > >> > > Any thoughts as to why this would happen? > >> > > > >> > > I added a significant amount of logging to the > >> getDescriptorFileContent() > >> > > method in MergeContent to see if there was a reason why it would > return > >> > > null for the demarcator, but nothing obvious is really jumping out > at > >> me. > >> > > >> >