Nissim

I like the idea to introduce a more refined type of event for how data is
brought into nifi (active - PULL, passive - RECEIVE).

That said it might be sufficient to simply have this distinction be on the
"RECEIVE" event as a metadata item specifying active vs passive.  The
protocol utilized as mentioned in the transport URI should clarify this
though.

In short - i think there is a way here that is all opt-in for existing
users and components.

Thanks

On Tue, Oct 29, 2019 at 9:41 AM Nissim Shiman <[email protected]>
wrote:

>  Adam,
> good points...
> I missed a step in explaining the use case where Provenance Events is
> incomplete...
> Where the second nifi does a GetSFTP from the *filesytem* that the first
> nifi is located on
> So the second nifi currently sends a RECEIVE event, but there is no
> corresponding SEND event from the first nifi (nor should there be)
> If the second nifi sent a PULL event, it would be easier for a system
> overseer to know that there should be no corresponding SEND event
>
> Currently send/receive works well when nifi 1 does a PostHTTP and nifi 2
> does a ListenHTTP, but not in the case above.
>
> The ERROR case you mention is a nice point as well, although not my
> specific issue at the moment.
> Thanks,
> Nissim
>
>
>
>
>
>     On Monday, October 28, 2019, 11:52:57 PM EDT, Adam Taft <
> [email protected]> wrote:
>
>  > But a flowfile that was PULLed by the second nifi (from the first nifi)
> will not necessarily have any provenance event generated by the first nifi.
>
> Isn't this the fault of the first NiFi to fail to emit a SEND event in
> response to the second NiFi's request?  In this scenario, shouldn't the
> send/receive pair be:
> NiFi-1 [SEND] :: NIFI-2 [RECEIVE]?
>
> What you describe is an odd use case for NiFi.  NiFi is usually not in the
> business of acting as a file server daemon in order to "send" flowfiles to
> other systems.  As you mention, HandleHttpResponse may be a lone wolf
> example processor which generates a SEND event whose input originates from
> a "listener". [1]  The other ListenXYZ processors generally issue RECEIVE
> events because they are receiving bytes, not generating them.
>
> Are there other processors in question? Something custom? Or is this
> related to site-to-site transfers?
>
> I still kind of question the motive of a provenance event pair that is
> trying to establish "who called who first".  Honestly just trying to
> understand the use case where a matching SEND/RECEIVE pair doesn't give you
> what you need.
>
> The only thing I could see would be a processor that asks for data, but
> then doesn't receive it due to some error condition.  In this case, adding
> some sort of ERROR event might be useful.  "I attempted to retrieve data
> from ${uri}, but the transfer failed because of ${error condition}".  That
> way, GetXYZ processors could report an error to provenance instead of as a
> bulletin.
>
> If the problem is related to a processor or the framework itself not
> generating an event, can we just fix that function to emit SEND in the
> scenario that you describe?  Changing the provenance model itself (beyond
> possibly adding an ERROR event) feels like it would be the last scenario to
> consider.
>
> Thanks,
> Adam
>
> [1]
>
> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/HandleHttpResponse.java#L191
>
>
>
>
> On Mon, Oct 28, 2019 at 4:47 PM Nissim Shiman <[email protected]>
> wrote:
>
> >  Adam,
> > I believe there is a need for more detailed ProvenanceEvents.
> > A use case would be a customer that is trying to track data passed
> between
> > two nifi's and trying to match up SENDs and RECEIVEs
> >
> > So a flowfile that has a SEND event on the first nifi should have a
> > RECEIVE event on the second nifi.
> > But a flowfile that was PULLed by the second nifi (from the first nifi)
> > will not necessarily have any provenance event generated by the first
> nifi.
> >
> > (I realize that FETCH is already a "reserved word" in the current
> > ProvenanceEvents setup, so I was hoping PULL could be used instead.)
> > There is another Provenance Event, ACKNOWLEDGE, which would also fit
> > occasionally to this model as well (an example would be
> HandleHttpResponse
> > processor which could send this instead of SEND when responding to a HTTP
> > request)
> > This being said, you make an excellent point when you said
> > "However even more important to realize,
> > this change would affect many other downstream consumers of provenance
> data
> > which aren't necessarily in the stock NiFi distribution."
> > Thanks,
> > Nissim
> >
> >    On Friday, October 11, 2019, 11:30:19 AM EDT, Nissim Shiman
> > <[email protected]> wrote:
> >
> >  Adam,
> > "Yes" to your first question and the four processor examples you listed.
> >
> > I will need to get back to you regarding your other points.
> >
> > Thanks,
> > Nissim
> >
> >    On Thursday, October 10, 2019, 7:05:57 PM EDT, Adam Taft <
> > [email protected]> wrote:
> >
> >  Nissim,
> >
> > Just to be clear, you are trying to distinguish between processors which
> > are actively "pulling" data (GetXYZ) vs. processors which just "listen"
> for
> > data (ListenXYZ)?  Is that your basic vision?
> >
> > GetFile => PULL
> > GetHTTP => PULL
> > ListenHTTP => RECEIVE
> > ListenTCP => RECEIVE
> >
> > Could you clarify what advantages this would have in terms of data
> > provenance?  What would you use this new event type for specifically?
> What
> > are you missing now? Do you have a use case that needs this, or are you
> > just generally trying to round out the provenance event types for sake of
> > completeness?  I honestly don't know a use case where you care whether
> you
> > polled for the data or listened for it.  The provenance model today just
> > cares that you received the data, not so much how you received it.
> >
> > You're right that this proposal will affect many processors and the
> > internal visualization tools, etc.  However even more important to
> realize,
> > this change would affect many other downstream consumers of provenance
> data
> > which aren't necessarily in the stock NiFi distribution.  For example,
> any
> > third-party/custom ReportingTask that handles provenance data would need
> to
> > be updated with this change.  There's probably need for a strong vision
> to
> > help demonstrate the value for this vs. the cost of the cascading effects
> > related to this change.
> >
> > Thanks,
> > Adam
> >
> >
> > On Thu, Oct 10, 2019 at 4:02 PM Nissim Shiman <[email protected]
> >
> > wrote:
> >
> > > Hello Team,
> > >
> > > The ProvenanceEventType class does a good job capturing possible
> events,
> > > but the PULL event doesn't seem to fall nicely into any of the existing
> > > types.
> > >
> > >
> >
> https://gitbox.apache.org/repos/asf?p=nifi.git;a=blob;f=nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceEventType.java
> > > RECEIVE is the closest, but RECEIVE is passive and doesn't capture the
> > > active action of a PULL
> > >
> > > And... maybe it would fall into FETCH, but FETCH is more focused on
> > > contents of an existing flow file being overwritten.
> > >
> > > What does the community think about a new PULL event type,
> > > or
> > >  using FETCH for PULL, and having what FETCH does now be a new event
> such
> > > as REUSE
> > >
> > > NOTE: a new PULL event would have a cascading effect of many processors
> > > that currently are emitting RECEIVE's being modified to be PULL
> > > (i.e. So GetFile would no longer be a RECEIVE, but rather a PULL), but
> > > would more accurately capture the event.
> > >
> > > Thanks,
> > > Nissim Shiman
> > >
> > >
> >
>

Reply via email to