[
https://issues.apache.org/jira/browse/NIFI-4707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16298037#comment-16298037
]
ASF GitHub Bot commented on NIFI-4707:
--------------------------------------
Github user ijokarumawak commented on the issue:
https://github.com/apache/nifi/pull/2351
Hi @mattyb149 Thanks for updating this PR. It mostly looks good, however,
while I was testing, I found few points those can be improved. I went ahead and
added following improvements on top ob your commits. Would you cherry-pick this
commit?
https://github.com/ijokarumawak/nifi/commit/8effe3b19681ac34594a2f33e9d049ef081730a6
1. "Remote Input/Output Port" port name and process group id can only be
retrieved by mapping ConnectionStatus source or destination component id.
2. When a ProcessGroupId is used to filter events, the filtering should
consider PG hierarchy, meaning if PG1 is a child of Root, and PG2 is a child of
PG1, and PG1 uuid is used as filter component id, then provenance events
happening at PG2 should also be reported.
Other minor improvements:
- Simplified consumeEvents method signature
- Refactored ComponentMapHolder methods visibility
- Renamed componentMap to componentNameMap
- Throw an exception when the reporting task fails to send provenance
data to keep current provenance event index so that events can be consumed again
Thank you!
> SiteToSiteProvenanceReportingTask not returning correct metadata
> ----------------------------------------------------------------
>
> Key: NIFI-4707
> URL: https://issues.apache.org/jira/browse/NIFI-4707
> Project: Apache NiFi
> Issue Type: Bug
> Components: Extensions
> Reporter: Matt Burgess
> Assignee: Matt Burgess
>
> When the SiteToSiteProvenanceReportingTask emits flow files, some of them
> include a "componentName" field and some do not. Investigation shows that
> only the components (except connections) in the root process group have that
> field populated. Having this information can be very helpful to the user,
> even though the names might be duplicated, there would be a mapping between a
> component's ID and its name. At the very least the behavior (i.e. component
> name being available) should be consistent.
> Having a full map (by traversing the entire flow) also opens up the ability
> to include Process Group information for the various components. The
> reporting task could include the parent Process Group identifier and/or name,
> with perhaps a special ID for the root PG's "parent", such as "@ROOT@" or
> something unique.
> This could also allow for a PG ID in the list of filtered "component IDs",
> where any provenance event for a processor in a particular PG could be
> included in a filter when that PG's ID is in the filter list.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)