I like the idea of creating PULL as a type. In fact, I'd propose that there
are three scenarios here:

RECEIVE - Passively acquire in a sort of hand-off situation. Ex: Kafka
subscription
PULL - Direct operations to seek out and fetch something in a targeted
fashion. Ex. GetHttp
QUERY - Go looking for the data and take what matches your search. Ex.
JsonQueryElasticsearch, GetMongo, any SQL query processor, etc.



On Wed, Oct 30, 2019 at 1:31 PM Nissim Shiman <nshi...@yahoo.com.invalid>
wrote:

>  Joe,
>
>
> It is hard to say how much value transit URI would bring to clarify a
> RECEIVE.
> For example a RECEIVE with transit URI of https:<etc.> could be either a
> GetHTTP (i.e. active) or ListenHTTP (i.e. passive)
>
> but your idea of "a metadata item specifying active vs passive" is a very
> clever way to make this work with mimimal disruptions.
>
> My understanding of this is that the current receive() calls in
> ProvenanceReporter [1] will remain the same, but news ones will be added
> with a boolean parameter reflecting if the receive is active or passive.
> This will allow the current list of Provenance Events [2] to remain the
> same.  So third party/custom processors can continue working as is
>
> Does this sound like what you are thinking?
>
>
> [1]
> https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceReporter.java#L46
>
> [2]
> https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceEventType.java
>
>
> Thanks,
>
> Nissim
>     On Tuesday, October 29, 2019, 12:47:40 PM EDT, Joe Witt <
> joe.w...@gmail.com> wrote:
>
>  Nissim
>
> I like the idea to introduce a more refined type of event for how data is
> brought into nifi (active - PULL, passive - RECEIVE).
>
> That said it might be sufficient to simply have this distinction be on the
> "RECEIVE" event as a metadata item specifying active vs passive.  The
> protocol utilized as mentioned in the transport URI should clarify this
> though.
>
> In short - i think there is a way here that is all opt-in for existing
> users and components.
>
> Thanks
>
> On Tue, Oct 29, 2019 at 9:41 AM Nissim Shiman <nshi...@yahoo.com.invalid>
> wrote:
>
> >  Adam,
> > good points...
> > I missed a step in explaining the use case where Provenance Events is
> > incomplete...
> > Where the second nifi does a GetSFTP from the *filesytem* that the first
> > nifi is located on
> > So the second nifi currently sends a RECEIVE event, but there is no
> > corresponding SEND event from the first nifi (nor should there be)
> > If the second nifi sent a PULL event, it would be easier for a system
> > overseer to know that there should be no corresponding SEND event
> >
> > Currently send/receive works well when nifi 1 does a PostHTTP and nifi 2
> > does a ListenHTTP, but not in the case above.
> >
> > The ERROR case you mention is a nice point as well, although not my
> > specific issue at the moment.
> > Thanks,
> > Nissim
> >
> >
> >
> >
> >
> >    On Monday, October 28, 2019, 11:52:57 PM EDT, Adam Taft <
> > a...@adamtaft.com> wrote:
> >
> >  > But a flowfile that was PULLed by the second nifi (from the first
> nifi)
> > will not necessarily have any provenance event generated by the first
> nifi.
> >
> > Isn't this the fault of the first NiFi to fail to emit a SEND event in
> > response to the second NiFi's request?  In this scenario, shouldn't the
> > send/receive pair be:
> > NiFi-1 [SEND] :: NIFI-2 [RECEIVE]?
> >
> > What you describe is an odd use case for NiFi.  NiFi is usually not in
> the
> > business of acting as a file server daemon in order to "send" flowfiles
> to
> > other systems.  As you mention, HandleHttpResponse may be a lone wolf
> > example processor which generates a SEND event whose input originates
> from
> > a "listener". [1]  The other ListenXYZ processors generally issue RECEIVE
> > events because they are receiving bytes, not generating them.
> >
> > Are there other processors in question? Something custom? Or is this
> > related to site-to-site transfers?
> >
> > I still kind of question the motive of a provenance event pair that is
> > trying to establish "who called who first".  Honestly just trying to
> > understand the use case where a matching SEND/RECEIVE pair doesn't give
> you
> > what you need.
> >
> > The only thing I could see would be a processor that asks for data, but
> > then doesn't receive it due to some error condition.  In this case,
> adding
> > some sort of ERROR event might be useful.  "I attempted to retrieve data
> > from ${uri}, but the transfer failed because of ${error condition}".
> That
> > way, GetXYZ processors could report an error to provenance instead of as
> a
> > bulletin.
> >
> > If the problem is related to a processor or the framework itself not
> > generating an event, can we just fix that function to emit SEND in the
> > scenario that you describe?  Changing the provenance model itself (beyond
> > possibly adding an ERROR event) feels like it would be the last scenario
> to
> > consider.
> >
> > Thanks,
> > Adam
> >
> > [1]
> >
> >
> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/HandleHttpResponse.java#L191
> >
> >
> >
> >
> > On Mon, Oct 28, 2019 at 4:47 PM Nissim Shiman <nshi...@yahoo.com.invalid
> >
> > wrote:
> >
> > >  Adam,
> > > I believe there is a need for more detailed ProvenanceEvents.
> > > A use case would be a customer that is trying to track data passed
> > between
> > > two nifi's and trying to match up SENDs and RECEIVEs
> > >
> > > So a flowfile that has a SEND event on the first nifi should have a
> > > RECEIVE event on the second nifi.
> > > But a flowfile that was PULLed by the second nifi (from the first nifi)
> > > will not necessarily have any provenance event generated by the first
> > nifi.
> > >
> > > (I realize that FETCH is already a "reserved word" in the current
> > > ProvenanceEvents setup, so I was hoping PULL could be used instead.)
> > > There is another Provenance Event, ACKNOWLEDGE, which would also fit
> > > occasionally to this model as well (an example would be
> > HandleHttpResponse
> > > processor which could send this instead of SEND when responding to a
> HTTP
> > > request)
> > > This being said, you make an excellent point when you said
> > > "However even more important to realize,
> > > this change would affect many other downstream consumers of provenance
> > data
> > > which aren't necessarily in the stock NiFi distribution."
> > > Thanks,
> > > Nissim
> > >
> > >    On Friday, October 11, 2019, 11:30:19 AM EDT, Nissim Shiman
> > > <nshi...@yahoo.com.invalid> wrote:
> > >
> > >  Adam,
> > > "Yes" to your first question and the four processor examples you
> listed.
> > >
> > > I will need to get back to you regarding your other points.
> > >
> > > Thanks,
> > > Nissim
> > >
> > >    On Thursday, October 10, 2019, 7:05:57 PM EDT, Adam Taft <
> > > a...@adamtaft.com> wrote:
> > >
> > >  Nissim,
> > >
> > > Just to be clear, you are trying to distinguish between processors
> which
> > > are actively "pulling" data (GetXYZ) vs. processors which just "listen"
> > for
> > > data (ListenXYZ)?  Is that your basic vision?
> > >
> > > GetFile => PULL
> > > GetHTTP => PULL
> > > ListenHTTP => RECEIVE
> > > ListenTCP => RECEIVE
> > >
> > > Could you clarify what advantages this would have in terms of data
> > > provenance?  What would you use this new event type for specifically?
> > What
> > > are you missing now? Do you have a use case that needs this, or are you
> > > just generally trying to round out the provenance event types for sake
> of
> > > completeness?  I honestly don't know a use case where you care whether
> > you
> > > polled for the data or listened for it.  The provenance model today
> just
> > > cares that you received the data, not so much how you received it.
> > >
> > > You're right that this proposal will affect many processors and the
> > > internal visualization tools, etc.  However even more important to
> > realize,
> > > this change would affect many other downstream consumers of provenance
> > data
> > > which aren't necessarily in the stock NiFi distribution.  For example,
> > any
> > > third-party/custom ReportingTask that handles provenance data would
> need
> > to
> > > be updated with this change.  There's probably need for a strong vision
> > to
> > > help demonstrate the value for this vs. the cost of the cascading
> effects
> > > related to this change.
> > >
> > > Thanks,
> > > Adam
> > >
> > >
> > > On Thu, Oct 10, 2019 at 4:02 PM Nissim Shiman
> <nshi...@yahoo.com.invalid
> > >
> > > wrote:
> > >
> > > > Hello Team,
> > > >
> > > > The ProvenanceEventType class does a good job capturing possible
> > events,
> > > > but the PULL event doesn't seem to fall nicely into any of the
> existing
> > > > types.
> > > >
> > > >
> > >
> >
> https://gitbox.apache.org/repos/asf?p=nifi.git;a=blob;f=nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceEventType.java
> > > > RECEIVE is the closest, but RECEIVE is passive and doesn't capture
> the
> > > > active action of a PULL
> > > >
> > > > And... maybe it would fall into FETCH, but FETCH is more focused on
> > > > contents of an existing flow file being overwritten.
> > > >
> > > > What does the community think about a new PULL event type,
> > > > or
> > > >  using FETCH for PULL, and having what FETCH does now be a new event
> > such
> > > > as REUSE
> > > >
> > > > NOTE: a new PULL event would have a cascading effect of many
> processors
> > > > that currently are emitting RECEIVE's being modified to be PULL
> > > > (i.e. So GetFile would no longer be a RECEIVE, but rather a PULL),
> but
> > > > would more accurately capture the event.
> > > >
> > > > Thanks,
> > > > Nissim Shiman
> > > >
> > > >
> > >
> >
>

Reply via email to