+1 Joe - this is a good compromise to keep the original API undisturbed.
On Wed, Nov 6, 2019 at 11:05 AM Joe Witt <[email protected]> wrote: > Nissim > > Notionally I am saying that session.getProvenanceReporter().receive(...) > should have an option to call > session.getProvenanceReporter().receive(...,ACTIVE|PASSIVE) and if not > specified it would be UNSPECIFIED. > > I dont think this needs to be on the flowfile attribute - it would go > straight to the provenance event itself which is generated by the session. > > Thanks > Joe > > On Wed, Nov 6, 2019 at 11:32 AM Nissim Shiman <[email protected]> > wrote: > > > Joe, > > > > Just to verify what you mean, > > > > You are saying that the line: > > flowfile = session.putAttribute(flowfile, "receiveType", "active") > > > > could be added before > > session.getProvenanceReporter().receive(...) > > > > > > to indicate a PULL. Is this correct? > > > > Thanks, > > > > Nissim > > > > > > > > > > > > > > On Monday, November 4, 2019, 12:50:11 PM EST, Nissim Shiman > > <[email protected]> wrote: > > > > Having an attribute added indicating passive/active/query for RECEIVE > > and FETCH will work, > > > > but nifi attributes are stateful (i.e. they will still be on the flowfile > > as metadata a couple of processor steps down the flow) > > > > Maybe an option is to expand the the api for RECEIVE and FETCH for with a > > new parameter for passive/active/query ? > > (i.e. the existing message signatures, such as [1] will remain the same, > > but new ones will be added to handle this new parameter? > > > > [1] > > > https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceReporter.java#L46 > > > > > > On Thursday, October 31, 2019, 10:10:40 PM EDT, Joe Witt < > > [email protected]> wrote: > > > > These distinctions may be meaningful. Adding them as an attribute lets > > the > > meaning convey but not introduce complexity for the majority case which > is > > the distinction isnt key. > > > > thanks > > > > On Thu, Oct 31, 2019 at 4:05 PM Nissim Shiman <[email protected] > > > > wrote: > > > > > Mike, > > > I like the QUERY type as well. Basically a more refined PULL. Very > > nice. > > > > > > > > > Part of the challenge of adding PULL as a type is that there are > > currently > > > two flavors of RECEIVEs. > > > RECEIVE and FETCH [1] > > > > > > So any addition of a PULL would need a second flavor of PULL to match > the > > > case where a flowfile's contents are being overwritten as well (i.e. as > > > FETCH is currently doing) > > > > > > > > > [1] > > > > > > https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceEventType.java#L42 > > > > > > > > > Thanks, > > > Nissim > > > > > > > > > On Wednesday, October 30, 2019, 6:41:04 PM EDT, Mike Thomsen < > > > [email protected]> wrote: > > > > > > I like the idea of creating PULL as a type. In fact, I'd propose that > > > there > > > are three scenarios here: > > > > > > RECEIVE - Passively acquire in a sort of hand-off situation. Ex: Kafka > > > subscription > > > PULL - Direct operations to seek out and fetch something in a targeted > > > fashion. Ex. GetHttp > > > QUERY - Go looking for the data and take what matches your search. Ex. > > > JsonQueryElasticsearch, GetMongo, any SQL query processor, etc. > > > > > > > > > > > > On Wed, Oct 30, 2019 at 1:31 PM Nissim Shiman > <[email protected] > > > > > > wrote: > > > > > > > Joe, > > > > > > > > > > > > It is hard to say how much value transit URI would bring to clarify a > > > > RECEIVE. > > > > For example a RECEIVE with transit URI of https:<etc.> could be > either > > a > > > > GetHTTP (i.e. active) or ListenHTTP (i.e. passive) > > > > > > > > but your idea of "a metadata item specifying active vs passive" is a > > very > > > > clever way to make this work with mimimal disruptions. > > > > > > > > My understanding of this is that the current receive() calls in > > > > ProvenanceReporter [1] will remain the same, but news ones will be > > added > > > > with a boolean parameter reflecting if the receive is active or > > passive. > > > > This will allow the current list of Provenance Events [2] to remain > the > > > > same. So third party/custom processors can continue working as is > > > > > > > > Does this sound like what you are thinking? > > > > > > > > > > > > [1] > > > > > > > > > > https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceReporter.java#L46 > > > > > > > > [2] > > > > apache/nifi > > > > > > > > > > > > Thanks, > > > > > > > > Nissim > > > > On Tuesday, October 29, 2019, 12:47:40 PM EDT, Joe Witt < > > > > [email protected]> wrote: > > > > > > > > Nissim > > > > > > > > I like the idea to introduce a more refined type of event for how > data > > is > > > > brought into nifi (active - PULL, passive - RECEIVE). > > > > > > > > That said it might be sufficient to simply have this distinction be > on > > > the > > > > "RECEIVE" event as a metadata item specifying active vs passive. The > > > > protocol utilized as mentioned in the transport URI should clarify > this > > > > though. > > > > > > > > In short - i think there is a way here that is all opt-in for > existing > > > > users and components. > > > > > > > > Thanks > > > > > > > > On Tue, Oct 29, 2019 at 9:41 AM Nissim Shiman > > <[email protected] > > > > > > > > wrote: > > > > > > > > > Adam, > > > > > good points... > > > > > I missed a step in explaining the use case where Provenance Events > is > > > > > incomplete... > > > > > Where the second nifi does a GetSFTP from the *filesytem* that the > > > first > > > > > nifi is located on > > > > > So the second nifi currently sends a RECEIVE event, but there is no > > > > > corresponding SEND event from the first nifi (nor should there be) > > > > > If the second nifi sent a PULL event, it would be easier for a > system > > > > > overseer to know that there should be no corresponding SEND event > > > > > > > > > > Currently send/receive works well when nifi 1 does a PostHTTP and > > nifi > > > 2 > > > > > does a ListenHTTP, but not in the case above. > > > > > > > > > > The ERROR case you mention is a nice point as well, although not my > > > > > specific issue at the moment. > > > > > Thanks, > > > > > Nissim > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Monday, October 28, 2019, 11:52:57 PM EDT, Adam Taft < > > > > > [email protected]> wrote: > > > > > > > > > > > But a flowfile that was PULLed by the second nifi (from the > first > > > > nifi) > > > > > will not necessarily have any provenance event generated by the > first > > > > nifi. > > > > > > > > > > Isn't this the fault of the first NiFi to fail to emit a SEND event > > in > > > > > response to the second NiFi's request? In this scenario, shouldn't > > the > > > > > send/receive pair be: > > > > > NiFi-1 [SEND] :: NIFI-2 [RECEIVE]? > > > > > > > > > > What you describe is an odd use case for NiFi. NiFi is usually not > > in > > > > the > > > > > business of acting as a file server daemon in order to "send" > > flowfiles > > > > to > > > > > other systems. As you mention, HandleHttpResponse may be a lone > wolf > > > > > example processor which generates a SEND event whose input > originates > > > > from > > > > > a "listener". [1] The other ListenXYZ processors generally issue > > > RECEIVE > > > > > events because they are receiving bytes, not generating them. > > > > > > > > > > Are there other processors in question? Something custom? Or is > this > > > > > related to site-to-site transfers? > > > > > > > > > > I still kind of question the motive of a provenance event pair that > > is > > > > > trying to establish "who called who first". Honestly just trying > to > > > > > understand the use case where a matching SEND/RECEIVE pair doesn't > > give > > > > you > > > > > what you need. > > > > > > > > > > The only thing I could see would be a processor that asks for data, > > but > > > > > then doesn't receive it due to some error condition. In this case, > > > > adding > > > > > some sort of ERROR event might be useful. "I attempted to retrieve > > > data > > > > > from ${uri}, but the transfer failed because of ${error > condition}". > > > > That > > > > > way, GetXYZ processors could report an error to provenance instead > of > > > as > > > > a > > > > > bulletin. > > > > > > > > > > If the problem is related to a processor or the framework itself > not > > > > > generating an event, can we just fix that function to emit SEND in > > the > > > > > scenario that you describe? Changing the provenance model itself > > > (beyond > > > > > possibly adding an ERROR event) feels like it would be the last > > > scenario > > > > to > > > > > consider. > > > > > > > > > > Thanks, > > > > > Adam > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/HandleHttpResponse.java#L191 > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Oct 28, 2019 at 4:47 PM Nissim Shiman > > > <[email protected] > > > > > > > > > > wrote: > > > > > > > > > > > Adam, > > > > > > I believe there is a need for more detailed ProvenanceEvents. > > > > > > A use case would be a customer that is trying to track data > passed > > > > > between > > > > > > two nifi's and trying to match up SENDs and RECEIVEs > > > > > > > > > > > > So a flowfile that has a SEND event on the first nifi should > have a > > > > > > RECEIVE event on the second nifi. > > > > > > But a flowfile that was PULLed by the second nifi (from the first > > > nifi) > > > > > > will not necessarily have any provenance event generated by the > > first > > > > > nifi. > > > > > > > > > > > > (I realize that FETCH is already a "reserved word" in the current > > > > > > ProvenanceEvents setup, so I was hoping PULL could be used > > instead.) > > > > > > There is another Provenance Event, ACKNOWLEDGE, which would also > > fit > > > > > > occasionally to this model as well (an example would be > > > > > HandleHttpResponse > > > > > > processor which could send this instead of SEND when responding > to > > a > > > > HTTP > > > > > > request) > > > > > > This being said, you make an excellent point when you said > > > > > > "However even more important to realize, > > > > > > this change would affect many other downstream consumers of > > > provenance > > > > > data > > > > > > which aren't necessarily in the stock NiFi distribution." > > > > > > Thanks, > > > > > > Nissim > > > > > > > > > > > > On Friday, October 11, 2019, 11:30:19 AM EDT, Nissim Shiman > > > > > > <[email protected]> wrote: > > > > > > > > > > > > Adam, > > > > > > "Yes" to your first question and the four processor examples you > > > > listed. > > > > > > > > > > > > I will need to get back to you regarding your other points. > > > > > > > > > > > > Thanks, > > > > > > Nissim > > > > > > > > > > > > On Thursday, October 10, 2019, 7:05:57 PM EDT, Adam Taft < > > > > > > [email protected]> wrote: > > > > > > > > > > > > Nissim, > > > > > > > > > > > > Just to be clear, you are trying to distinguish between > processors > > > > which > > > > > > are actively "pulling" data (GetXYZ) vs. processors which just > > > "listen" > > > > > for > > > > > > data (ListenXYZ)? Is that your basic vision? > > > > > > > > > > > > GetFile => PULL > > > > > > GetHTTP => PULL > > > > > > ListenHTTP => RECEIVE > > > > > > ListenTCP => RECEIVE > > > > > > > > > > > > Could you clarify what advantages this would have in terms of > data > > > > > > provenance? What would you use this new event type for > > specifically? > > > > > What > > > > > > are you missing now? Do you have a use case that needs this, or > are > > > you > > > > > > just generally trying to round out the provenance event types for > > > sake > > > > of > > > > > > completeness? I honestly don't know a use case where you care > > > whether > > > > > you > > > > > > polled for the data or listened for it. The provenance model > today > > > > just > > > > > > cares that you received the data, not so much how you received > it. > > > > > > > > > > > > You're right that this proposal will affect many processors and > the > > > > > > internal visualization tools, etc. However even more important > to > > > > > realize, > > > > > > this change would affect many other downstream consumers of > > > provenance > > > > > data > > > > > > which aren't necessarily in the stock NiFi distribution. For > > > example, > > > > > any > > > > > > third-party/custom ReportingTask that handles provenance data > would > > > > need > > > > > to > > > > > > be updated with this change. There's probably need for a strong > > > vision > > > > > to > > > > > > help demonstrate the value for this vs. the cost of the cascading > > > > effects > > > > > > related to this change. > > > > > > > > > > > > Thanks, > > > > > > Adam > > > > > > > > > > > > > > > > > > On Thu, Oct 10, 2019 at 4:02 PM Nissim Shiman > > > > <[email protected] > > > > > > > > > > > > wrote: > > > > > > > > > > > > > Hello Team, > > > > > > > > > > > > > > The ProvenanceEventType class does a good job capturing > possible > > > > > events, > > > > > > > but the PULL event doesn't seem to fall nicely into any of the > > > > existing > > > > > > > types. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://gitbox.apache.org/repos/asf?p=nifi.git;a=blob;f=nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceEventType.java > > > > > > > RECEIVE is the closest, but RECEIVE is passive and doesn't > > capture > > > > the > > > > > > > active action of a PULL > > > > > > > > > > > > > > And... maybe it would fall into FETCH, but FETCH is more > focused > > on > > > > > > > contents of an existing flow file being overwritten. > > > > > > > > > > > > > > What does the community think about a new PULL event type, > > > > > > > or > > > > > > > using FETCH for PULL, and having what FETCH does now be a new > > > event > > > > > such > > > > > > > as REUSE > > > > > > > > > > > > > > NOTE: a new PULL event would have a cascading effect of many > > > > processors > > > > > > > that currently are emitting RECEIVE's being modified to be PULL > > > > > > > (i.e. So GetFile would no longer be a RECEIVE, but rather a > > PULL), > > > > but > > > > > > > would more accurately capture the event. > > > > > > > > > > > > > > Thanks, > > > > > > > Nissim Shiman > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > | > > > | > > > | > > > | | | > > > > > > | > > > > > > | > > > | > > > | | > > > apache/nifi > > > > > > Mirror of Apache NiFi. Contribute to apache/nifi development by > creating > > > an account on GitHub. > > > | > > > > > > | > > > > > > | > > > > > > > > > > > > > > >
