Nissim I like the idea to introduce a more refined type of event for how data is brought into nifi (active - PULL, passive - RECEIVE).
That said it might be sufficient to simply have this distinction be on the "RECEIVE" event as a metadata item specifying active vs passive. The protocol utilized as mentioned in the transport URI should clarify this though. In short - i think there is a way here that is all opt-in for existing users and components. Thanks On Tue, Oct 29, 2019 at 9:41 AM Nissim Shiman <[email protected]> wrote: > Adam, > good points... > I missed a step in explaining the use case where Provenance Events is > incomplete... > Where the second nifi does a GetSFTP from the *filesytem* that the first > nifi is located on > So the second nifi currently sends a RECEIVE event, but there is no > corresponding SEND event from the first nifi (nor should there be) > If the second nifi sent a PULL event, it would be easier for a system > overseer to know that there should be no corresponding SEND event > > Currently send/receive works well when nifi 1 does a PostHTTP and nifi 2 > does a ListenHTTP, but not in the case above. > > The ERROR case you mention is a nice point as well, although not my > specific issue at the moment. > Thanks, > Nissim > > > > > > On Monday, October 28, 2019, 11:52:57 PM EDT, Adam Taft < > [email protected]> wrote: > > > But a flowfile that was PULLed by the second nifi (from the first nifi) > will not necessarily have any provenance event generated by the first nifi. > > Isn't this the fault of the first NiFi to fail to emit a SEND event in > response to the second NiFi's request? In this scenario, shouldn't the > send/receive pair be: > NiFi-1 [SEND] :: NIFI-2 [RECEIVE]? > > What you describe is an odd use case for NiFi. NiFi is usually not in the > business of acting as a file server daemon in order to "send" flowfiles to > other systems. As you mention, HandleHttpResponse may be a lone wolf > example processor which generates a SEND event whose input originates from > a "listener". [1] The other ListenXYZ processors generally issue RECEIVE > events because they are receiving bytes, not generating them. > > Are there other processors in question? Something custom? Or is this > related to site-to-site transfers? > > I still kind of question the motive of a provenance event pair that is > trying to establish "who called who first". Honestly just trying to > understand the use case where a matching SEND/RECEIVE pair doesn't give you > what you need. > > The only thing I could see would be a processor that asks for data, but > then doesn't receive it due to some error condition. In this case, adding > some sort of ERROR event might be useful. "I attempted to retrieve data > from ${uri}, but the transfer failed because of ${error condition}". That > way, GetXYZ processors could report an error to provenance instead of as a > bulletin. > > If the problem is related to a processor or the framework itself not > generating an event, can we just fix that function to emit SEND in the > scenario that you describe? Changing the provenance model itself (beyond > possibly adding an ERROR event) feels like it would be the last scenario to > consider. > > Thanks, > Adam > > [1] > > https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/HandleHttpResponse.java#L191 > > > > > On Mon, Oct 28, 2019 at 4:47 PM Nissim Shiman <[email protected]> > wrote: > > > Adam, > > I believe there is a need for more detailed ProvenanceEvents. > > A use case would be a customer that is trying to track data passed > between > > two nifi's and trying to match up SENDs and RECEIVEs > > > > So a flowfile that has a SEND event on the first nifi should have a > > RECEIVE event on the second nifi. > > But a flowfile that was PULLed by the second nifi (from the first nifi) > > will not necessarily have any provenance event generated by the first > nifi. > > > > (I realize that FETCH is already a "reserved word" in the current > > ProvenanceEvents setup, so I was hoping PULL could be used instead.) > > There is another Provenance Event, ACKNOWLEDGE, which would also fit > > occasionally to this model as well (an example would be > HandleHttpResponse > > processor which could send this instead of SEND when responding to a HTTP > > request) > > This being said, you make an excellent point when you said > > "However even more important to realize, > > this change would affect many other downstream consumers of provenance > data > > which aren't necessarily in the stock NiFi distribution." > > Thanks, > > Nissim > > > > On Friday, October 11, 2019, 11:30:19 AM EDT, Nissim Shiman > > <[email protected]> wrote: > > > > Adam, > > "Yes" to your first question and the four processor examples you listed. > > > > I will need to get back to you regarding your other points. > > > > Thanks, > > Nissim > > > > On Thursday, October 10, 2019, 7:05:57 PM EDT, Adam Taft < > > [email protected]> wrote: > > > > Nissim, > > > > Just to be clear, you are trying to distinguish between processors which > > are actively "pulling" data (GetXYZ) vs. processors which just "listen" > for > > data (ListenXYZ)? Is that your basic vision? > > > > GetFile => PULL > > GetHTTP => PULL > > ListenHTTP => RECEIVE > > ListenTCP => RECEIVE > > > > Could you clarify what advantages this would have in terms of data > > provenance? What would you use this new event type for specifically? > What > > are you missing now? Do you have a use case that needs this, or are you > > just generally trying to round out the provenance event types for sake of > > completeness? I honestly don't know a use case where you care whether > you > > polled for the data or listened for it. The provenance model today just > > cares that you received the data, not so much how you received it. > > > > You're right that this proposal will affect many processors and the > > internal visualization tools, etc. However even more important to > realize, > > this change would affect many other downstream consumers of provenance > data > > which aren't necessarily in the stock NiFi distribution. For example, > any > > third-party/custom ReportingTask that handles provenance data would need > to > > be updated with this change. There's probably need for a strong vision > to > > help demonstrate the value for this vs. the cost of the cascading effects > > related to this change. > > > > Thanks, > > Adam > > > > > > On Thu, Oct 10, 2019 at 4:02 PM Nissim Shiman <[email protected] > > > > wrote: > > > > > Hello Team, > > > > > > The ProvenanceEventType class does a good job capturing possible > events, > > > but the PULL event doesn't seem to fall nicely into any of the existing > > > types. > > > > > > > > > https://gitbox.apache.org/repos/asf?p=nifi.git;a=blob;f=nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceEventType.java > > > RECEIVE is the closest, but RECEIVE is passive and doesn't capture the > > > active action of a PULL > > > > > > And... maybe it would fall into FETCH, but FETCH is more focused on > > > contents of an existing flow file being overwritten. > > > > > > What does the community think about a new PULL event type, > > > or > > > using FETCH for PULL, and having what FETCH does now be a new event > such > > > as REUSE > > > > > > NOTE: a new PULL event would have a cascading effect of many processors > > > that currently are emitting RECEIVE's being modified to be PULL > > > (i.e. So GetFile would no longer be a RECEIVE, but rather a PULL), but > > > would more accurately capture the event. > > > > > > Thanks, > > > Nissim Shiman > > > > > > > > >
