Github user ijokarumawak commented on a diff in the pull request: https://github.com/apache/nifi/pull/2335#discussion_r157234799 --- Diff: nifi-nar-bundles/nifi-atlas-bundle/nifi-atlas-reporting-task/src/main/resources/docs/org.apache.nifi.atlas.reporting.AtlasNiFiFlowLineage/additionalDetails.html --- @@ -0,0 +1,541 @@ +<!DOCTYPE html> +<html lang="en"> +<!-- + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + http://www.apache.org/licenses/LICENSE-2.0 + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--> + <head> + <meta charset="utf-8" /> + <title>AtlasNiFiFlowLineage</title> + <link rel="stylesheet" href="/nifi-docs/css/component-usage.css" type="text/css" /> + </head> + + <body> + <h2>AtlasNiFiFlowLineage</h2> + + Table of contents: + <!-- TODO: Fix header tags most h4 should be h3 --> + <ul> + <li><a href="#how-it-works">Information reported to Atlas</a></li> + <li><a href="#nifi-atlas-types">NiFi Atlas Types</a></li> + <li><a href="#cluster-name">Cluster Name Resolution</a></li> + <li><a href="#nifi-flow-structure">NiFi flow structure</a> + <ul> + <li><a href="#path-separation">Path Separation Logic</a></li> + </ul> + </li> + <li><a href="#nifi-data-lineage">NiFi data lineage</a> + <ul> + <li><a href="#lineage-strategy">NiFi Lineage Strategy</a></li> + <li><a href="#provenance-events">NiFi Provenance Event Analysis</a></li> + <li><a href="#datasets-and-processors">Supported DataSets and Processors</a></li> + </ul> + </li> + <li><a href="#runs-in-cluster">How it runs in NiFi cluster</a></li> + <li><a href="#limitations">Limitations</a></li> + <li><a href="#atlas-configs">Atlas Server Configurations</a></li> + <li><a href="#atlas-emulator">Atlas Server Emulator</a></li> + </ul> + + <h3 id="how-it-works">Information reported to Atlas</h3> + <p>This reporting task stores two types of NiFi flow information, 'NiFi flow structure' and 'NiFi data lineage'.</p> + + <p>'NiFi flow structure' tells what components are running within a NiFi flow and how these are connected. It is reported by analyzing current NiFi flow structure, specifically NiFi component relationships.</p> + + <p>'NiFi data lineage' tells what part of NiFi flow interacts with different DataSets such as HDFS files or Hive tables ... etc. It is reported by analyzing NiFi provenance events.</p> + + <object data="nifi_atlas.svg" type="image/svg+xml" width="60%"></object> + + <p>Technically each information is sent using different protocol, Atlas REST API v2, and Notification via a Kafka topic as shown in above image.</p> + + + <p>As both information types use the same <a href="#nifi-atlas-types">NiFi Atlas Types</a> and <a href="#cluster-name">Cluster Name Resolution</a> concepts, it is recommended to start reading those sections first.</p> + + <h4 id="nifi-atlas-types">NiFi Atlas Types</h4> + + <p>This reporting task creates following NiFi specific types in Atlas Type system when it runs if these type definitions are not found.</p> + + <p>Green boxes represent sub-types of DataSet and blue ones are sub-types of Process. Gray lines represent entity ownership. + Red lines represent lineage.</p> + + <object data="nifi_types.svg" type="image/svg+xml" width="60%"></object> + + <ul> + <li>nifi_flow + <p>Represents a NiFI data flow.</p> + <p>As shown in the above diagram, nifi_flow owns other nifi_component types. + This owning relationship is defined by Atlas 'owned' constraint so that when a 'nifi_flow' entity is removed, all owned NiFi component entities are removed in cascading manner.</p> + <p>When this reporting task runs, it analyzes and traverse the entire flow structure, and create NiFi component entities in Atlas. + At later runs, it compares the current flow structure with the one stored in Atlas to figure out if any changes has been made since the last time the flow was reported. The reporting task updates NiFi component entities in Atlas if needed.<p> + <p>NiFi components those are removed from a NiFi flow also get deleted from Atlas. + However those entities can still be seen in Atlas search results or lineage graphs since Atlas uses 'Soft Delete' by default. + See <a href="#delete-handler">Atlas Delete Handler</a> for further detail.</p> + </li> + Attributes: + <ul> + <li>qualifiedName: Root ProcessGroup ID@clusterName (e.g. 86420a14-2fab-3e1e-4331-fb6ab42f58e0@cl1)</li> + <li>name: Name of the Root ProcessGroup.</li> + <li>url: URL of the NiFi instance. This can be specified via reporting task 'NiFi URL for Atlas' property.</li> + </ul> + </ul> + <ul> + <li>nifi_flow_path <p>Part of a NiFi data flow containing one or more processing NiFi components such as Processors and RemoteGroupPorts. The reporting task divides a NiFi flow into multiple flow paths. See <a href="#path-separation">Path Separation Logic</a> for details.</p></li> + Attributes: + <ul> + <li>qualifiedName: The first NiFi component Id in a path@clusterName (e.g. 529e6722-9b49-3b66-9c94-00da9863ca2d@cl1)</li> + <li>name: NiFi component namess within a path are concatenated (e.g. GenerateFlowFile, PutFile, LogAttribute)</li> + <li>url: A deep link to the first NiFi component in corresponding NiFi UI</li> + </ul> + </ul> + <ul> + <!-- TODO: link to S2S details --> --- End diff -- Forgot to delete the TODO. I was willing to write about the detail, but it's too internal I think. So just remove the TODO comment. I found other TODOs remaining in this document. All resolved.
---