Re: Kafkaesque Output port

Simon Lucy Thu, 16 Mar 2017 06:30:06 -0700

There are two different classes of queues really and you can't mix themsemantically.


The pubsub model


 * where ordering isn't guaranteed,
 * messages may appear at least once but can be duplicates
 * messages need to be explicitly deleted or aged
 * messages may or may not be persisted

The event model

 * ordering is necessary and guaranteed
 * messages appear only once
 * once read a message is discarded from the queue
 * messages are probably persisted

Kafka can be used for both models but Nifi Output Ports are meant to beEvent Queues. What you could do though is have an external messagebroker connect to the Output Port and distribute that to subscribers. Itcould be Kafka, Rabbit, AMQ AWS's SQS/SNS whatever makes sense in thecontext.


There's no need to modify or extend Nifi then.

S

Aldrin Piri <mailto:[email protected]>

Thursday, March 16, 2017 1:07 PMvia Postbox<https://www.postbox-inc.com/?utm_source=email&utm_medium=sumlink&utm_campaign=reach>

Hey Andre,

Interesting scenario and certainly can understand the need for such
functionality. As a bit of background, my mind traditionally goes to
custom controller services used for referencing datasets typically served
up via some service. This means we don't get the Site to Site goodness and
may be duplicating effort in terms of transiting information. Aspects of

this need are emerging in some of the initial C2 efforts where we havethis

1-N dispersion of flows/updates to instances, initial approaches are
certainly reminiscent of the above. I am an advocate for Site to Site
being "the way" data transport is done amongst NiFi/MiNiFi instances and
this is interlaced, in part, with the JIRA to make this an extension point
[1]. Perhaps, more simply and to frame in the sense of messaging, we have
no way of providing topic semantics between instances and only support
queueing whether that is push/pull. This new component or mode would be
very compelling in conjunction with the introduction of new protocols each
with their own operational guarantees/caveats.

Some of the first thoughts/questions that come to mind are:
* what buffering/age off looks like in context of a connection. In part,
I think we have the framework functionality already in place, but requires
a slightly different though process and context.
* management of progress through the "queue", for lack of a better word,
on a given topic and how/where that gets stored/managed. this would be the
analog of offsets
* is prioritization still a possibility? at first blush, it seems like
this would no longer make sense and/or be applicable
* what impact does this have on provenance? seems like it would still map
correctly; just many child send events for a given piece of data
* what would the sequence of receiving input port look like? just use run
schedule we have currently? Presumably this would be used for updates, so
I schedule it to check every N minutes and get all the updates since then?
(this could potentially be mitigated with backpressure/expiration
configuration on the associated connection).

I agree there is a certain need to fulfill that seems applicable to a
number of situations and finding a way to support this general data
exchange pattern in a framework level would be most excellent. Look
forward to discussing and exploring a bit more.

--aldrin

[1] https://issues.apache.org/jira/browse/NIFI-1820


Andre <mailto:[email protected]>

Thursday, March 16, 2017 12:27 PMvia Postbox<https://www.postbox-inc.com/?utm_source=email&utm_medium=sumlink&utm_campaign=reach>

dev,

I recently created a demo environment where two remote MiNiFiinstances (m1

and m2) were sending diverse range of security telemetry (suspicious email
attachments, syslog streams, individual session honeypot logs, merged
honeypot session logs, etc) from edge to DC via S2S Input ports

Once some of this data was processed at the hub I then used Outputports to

send contents back to the spokes, where the minifi instances use the
flowfiles contents as arguments of OS commands (called via Gooovy
String.execute().text via ExecuteScript).

The idea being to show how NiFi can be used in basic securityorchestration

(in this case updating m1's firewall tables with malicious IPs observed in
m2 and vice versa).


While crafting the demo I noticed the Output ports operate like queues,
therefore if one client consumed data from the port, the other was unable
to obtain the same flowfiles.

This is obviously not an issue when using 2 minifi clients (where I can
just create another output port and clone to content) but wouldn't flow
very well with hundred of clients.

I wonder if anyone would have a suggestion of how to achieve a N to 1
Output port like that? And if not, I wonder if we should create one?

Cheers


--
Simon Lucy
Technologist
G30 Consultants Ltd
+44 77 20 29 4658
<https://www.postbox-inc.com/?utm_source=email&utm_medium=siglink&utm_campaign=reach>

Re: Kafkaesque Output port

Reply via email to