Github user ijokarumawak commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/2335#discussion_r157234799
  
    --- Diff: 
nifi-nar-bundles/nifi-atlas-bundle/nifi-atlas-reporting-task/src/main/resources/docs/org.apache.nifi.atlas.reporting.AtlasNiFiFlowLineage/additionalDetails.html
 ---
    @@ -0,0 +1,541 @@
    +<!DOCTYPE html>
    +<html lang="en">
    +<!--
    +  Licensed to the Apache Software Foundation (ASF) under one or more
    +  contributor license agreements.  See the NOTICE file distributed with
    +  this work for additional information regarding copyright ownership.
    +  The ASF licenses this file to You under the Apache License, Version 2.0
    +  (the "License"); you may not use this file except in compliance with
    +  the License.  You may obtain a copy of the License at
    +      http://www.apache.org/licenses/LICENSE-2.0
    +  Unless required by applicable law or agreed to in writing, software
    +  distributed under the License is distributed on an "AS IS" BASIS,
    +  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +  See the License for the specific language governing permissions and
    +  limitations under the License.
    +-->
    +    <head>
    +        <meta charset="utf-8" />
    +        <title>AtlasNiFiFlowLineage</title>
    +        <link rel="stylesheet" href="/nifi-docs/css/component-usage.css" 
type="text/css" />
    +    </head>
    +
    +    <body>
    +        <h2>AtlasNiFiFlowLineage</h2>
    +
    +        Table of contents:
    +        <!-- TODO: Fix header tags most h4 should be h3 -->
    +        <ul>
    +            <li><a href="#how-it-works">Information reported to 
Atlas</a></li>
    +            <li><a href="#nifi-atlas-types">NiFi Atlas Types</a></li>
    +            <li><a href="#cluster-name">Cluster Name Resolution</a></li>
    +            <li><a href="#nifi-flow-structure">NiFi flow structure</a>
    +                <ul>
    +                    <li><a href="#path-separation">Path Separation 
Logic</a></li>
    +                </ul>
    +            </li>
    +            <li><a href="#nifi-data-lineage">NiFi data lineage</a>
    +                <ul>
    +                    <li><a href="#lineage-strategy">NiFi Lineage 
Strategy</a></li>
    +                    <li><a href="#provenance-events">NiFi Provenance Event 
Analysis</a></li>
    +                    <li><a href="#datasets-and-processors">Supported 
DataSets and Processors</a></li>
    +                </ul>
    +            </li>
    +            <li><a href="#runs-in-cluster">How it runs in NiFi 
cluster</a></li>
    +            <li><a href="#limitations">Limitations</a></li>
    +            <li><a href="#atlas-configs">Atlas Server 
Configurations</a></li>
    +            <li><a href="#atlas-emulator">Atlas Server Emulator</a></li>
    +        </ul>
    +
    +        <h3 id="how-it-works">Information reported to Atlas</h3>
    +        <p>This reporting task stores two types of NiFi flow information, 
'NiFi flow structure' and 'NiFi data lineage'.</p>
    +
    +        <p>'NiFi flow structure' tells what components are running within 
a NiFi flow and how these are connected. It is reported by analyzing current 
NiFi flow structure, specifically NiFi component relationships.</p>
    +
    +        <p>'NiFi data lineage' tells what part of NiFi flow interacts with 
different DataSets such as HDFS files or Hive tables ... etc. It is reported by 
analyzing NiFi provenance events.</p>
    +
    +        <object data="nifi_atlas.svg" type="image/svg+xml" 
width="60%"></object>
    +
    +        <p>Technically each information is sent using different protocol, 
Atlas REST API v2, and Notification via a Kafka topic as shown in above 
image.</p>
    +
    +
    +        <p>As both information types use the same <a 
href="#nifi-atlas-types">NiFi Atlas Types</a> and <a 
href="#cluster-name">Cluster Name Resolution</a> concepts, it is recommended to 
start reading those sections first.</p>
    +
    +        <h4 id="nifi-atlas-types">NiFi Atlas Types</h4>
    +
    +        <p>This reporting task creates following NiFi specific types in 
Atlas Type system when it runs if these type definitions are not found.</p>
    +
    +        <p>Green boxes represent sub-types of DataSet and blue ones are 
sub-types of Process. Gray lines represent entity ownership.
    +        Red lines represent lineage.</p>
    +
    +        <object data="nifi_types.svg" type="image/svg+xml" 
width="60%"></object>
    +
    +        <ul>
    +            <li>nifi_flow
    +                <p>Represents a NiFI data flow.</p>
    +                <p>As shown in the above diagram, nifi_flow owns other 
nifi_component types.
    +                    This owning relationship is defined by Atlas 'owned' 
constraint so that when a 'nifi_flow' entity is removed, all owned NiFi 
component entities are removed in cascading manner.</p>
    +                <p>When this reporting task runs, it analyzes and traverse 
the entire flow structure, and create NiFi component entities in Atlas.
    +                    At later runs, it compares the current flow structure 
with the one stored in Atlas to figure out if any changes has been made since 
the last time the flow was reported. The reporting task updates NiFi component 
entities in Atlas if needed.<p>
    +                <p>NiFi components those are removed from a NiFi flow also 
get deleted from Atlas.
    +                    However those entities can still be seen in Atlas 
search results or lineage graphs since Atlas uses 'Soft Delete' by default.
    +                    See <a href="#delete-handler">Atlas Delete Handler</a> 
for further detail.</p>
    +            </li>
    +            Attributes:
    +            <ul>
    +                <li>qualifiedName: Root ProcessGroup ID@clusterName (e.g. 
86420a14-2fab-3e1e-4331-fb6ab42f58e0@cl1)</li>
    +                <li>name: Name of the Root ProcessGroup.</li>
    +                <li>url: URL of the NiFi instance. This can be specified 
via reporting task 'NiFi URL for Atlas' property.</li>
    +            </ul>
    +        </ul>
    +        <ul>
    +            <li>nifi_flow_path <p>Part of a NiFi data flow containing one 
or more processing NiFi components such as Processors and RemoteGroupPorts. The 
reporting task divides a NiFi flow into multiple flow paths. See <a 
href="#path-separation">Path Separation Logic</a> for details.</p></li>
    +            Attributes:
    +            <ul>
    +                <li>qualifiedName: The first NiFi component Id in a 
path@clusterName (e.g. 529e6722-9b49-3b66-9c94-00da9863ca2d@cl1)</li>
    +                <li>name: NiFi component namess within a path are 
concatenated (e.g. GenerateFlowFile, PutFile, LogAttribute)</li>
    +                <li>url: A deep link to the first NiFi component in 
corresponding NiFi UI</li>
    +            </ul>
    +        </ul>
    +        <ul>
    +            <!-- TODO: link to S2S details -->
    --- End diff --
    
    Forgot to delete the TODO. I was willing to write about the detail, but 
it's too internal I think. So just remove the TODO comment. I found other TODOs 
remaining in this document. All resolved.


---

Reply via email to