Re: Kafkaesque Output port

Aldrin Piri Thu, 16 Mar 2017 06:08:56 -0700

Hey Andre,

Interesting scenario and certainly can understand the need for such
functionality.  As a bit of background, my mind traditionally goes to
custom controller services used for referencing datasets typically served
up via some service.  This means we don't get the Site to Site goodness and
may be duplicating effort in terms of transiting information.  Aspects of
this need are emerging in some of the initial C2 efforts where we have this
1-N dispersion of flows/updates to instances, initial approaches are
certainly reminiscent of the above.  I am an advocate for Site to Site
being "the way" data transport is done amongst NiFi/MiNiFi instances and
this is interlaced, in part, with the JIRA to make this an extension point
[1].  Perhaps, more simply and to frame in the sense of messaging, we have
no way of providing topic semantics between instances and only support
queueing whether that is push/pull.  This new component or mode would be
very compelling in conjunction with the introduction of new protocols each
with their own operational guarantees/caveats.

Some of the first thoughts/questions that come to mind are:
 * what buffering/age off looks like in context of a connection.  In part,
I think we have the framework functionality already in place, but requires
a slightly different though process and context.
 * management of progress through the "queue", for lack of a better word,
on a given topic and how/where that gets stored/managed.  this would be the
analog of offsets
 * is prioritization still a possibility?  at first blush, it seems like
this would no longer make sense and/or be applicable
 * what impact does this have on provenance?  seems like it would still map
correctly; just many child send events for a given piece of data
 * what would the sequence of receiving input port look like?  just use run
schedule we have currently?  Presumably this would be used for updates, so
I schedule it to check every N minutes and get all the updates since then?
 (this could potentially be mitigated with backpressure/expiration
configuration on the associated connection).

I agree there is a certain need to fulfill that seems applicable to a
number of situations and finding a way to support this general data
exchange pattern in a framework level would be most excellent.  Look
forward to discussing and exploring a bit more.

--aldrin

[1] https://issues.apache.org/jira/browse/NIFI-1820

On Thu, Mar 16, 2017 at 8:27 AM, Andre <[email protected]> wrote:

> dev,
>
> I recently created a demo environment where two remote MiNiFi instances (m1
> and m2) were sending diverse range of security telemetry (suspicious email
> attachments, syslog streams, individual session honeypot logs, merged
> honeypot session logs, etc) from edge to DC via S2S Input ports
>
> Once some of this data was processed at the hub I then used Output ports to
> send contents back to the spokes, where the minifi instances use the
> flowfiles contents as arguments of OS commands (called via Gooovy
> String.execute().text via ExecuteScript).
>
> The idea being to show how NiFi can be used in basic security orchestration
> (in this case updating m1's firewall tables with malicious IPs observed in
> m2 and vice versa).
>
>
> While crafting the demo I noticed the Output ports operate like queues,
> therefore if one client consumed data from the port, the other was unable
> to obtain the same flowfiles.
>
> This is obviously not an issue when using 2 minifi clients (where I can
> just create another output port and clone to content) but wouldn't flow
> very well with hundred of clients.
>
> I wonder if anyone would have a suggestion of how to achieve a N to 1
> Output port like that? And if not, I wonder if we should create one?
>
> Cheers
>

Re: Kafkaesque Output port

Reply via email to