[GitHub] [atlas] vladhlinsky commented on issue #88: ATLAS-3640 Update 'spark_ml_model_ml_directory' and 'spark_ml_pipeline_ml_directory' relationship definitions

GitBox Fri, 28 Feb 2020 12:01:42 -0800

vladhlinsky commented on issue #88: ATLAS-3640 Update 
'spark_ml_model_ml_directory' and 'spark_ml_pipeline_ml_directory' relationship 
definitions
URL: https://github.com/apache/atlas/pull/88#issuecomment-592703591
 
 
   I created the following functions in order to test proposed changes without 
Spark Atlas Connector:
   ```
   function create_ml_directory(){
     NAME=$1
     TIMESTAMP=$(($(date +%s%N)/1000000))
     
ML_DIR="{\""version\"":{\""version\"":\""1.0.0\"",\""versionParts\"":[1]},\""msgCompressionKind\"":\""NONE\"",
     
\""msgSplitIdx\"":1,\""msgSplitCount\"":1,\""msgSourceIP\"":\""172.27.12.6\"",\""msgCreatedBy\"":\""test\"",
     
\""msgCreationTime\"":$TIMESTAMP,\""message\"":{\""type\"":\""ENTITY_CREATE_V2\"",\""user\"":\""test\"",
     
\""entities\"":{\""entities\"":[{\""typeName\"":\""spark_ml_directory\"",\""attributes\"":
     
{\""qualifiedName\"":\""$NAME\"",\""name\"":\""$NAME\"",\""uri\"":\""hdfs://\"",\""directory\"":\""/test\""},
     
\""isIncomplete\"":false,\""provenanceType\"":0,\""version\"":0,\""proxy\"":false}]}}}"
        echo $ML_DIR | ./bin/kafka-console-producer.sh --topic ATLAS_HOOK 
--broker-list localhost:9092
   }
   
   function create_ml_model(){
     NAME=$1
     DIR_TYPE=$2
     DIR_NAME=$3
     TIMESTAMP=$(($(date +%s%N)/1000000))
     
ML_MODEL="{\""version\"":{\""version\"":\""1.0.0\"",\""versionParts\"":[1]},\""msgCompressionKind\"":\""NONE\"",
     
\""msgSplitIdx\"":1,\""msgSplitCount\"":1,\""msgSourceIP\"":\""172.27.12.6\"",\""msgCreatedBy\"":\""test\"",
     
\""msgCreationTime\"":$TIMESTAMP,\""message\"":{\""type\"":\""ENTITY_CREATE_V2\"",\""user\"":\""test\"",
     
\""entities\"":{\""entities\"":[{\""typeName\"":\""spark_ml_model\"",\""attributes\"":
     
{\""qualifiedName\"":\""$NAME\"",\""name\"":\""$NAME\""},\""isIncomplete\"":false,\""provenanceType\"":0,
     
\""version\"":0,\""relationshipAttributes\"":{\""directory\"":{\""typeName\"":\""$DIR_TYPE\"",
     
\""uniqueAttributes\"":{\""qualifiedName\"":\""$DIR_NAME\""}}},\""proxy\"":false}]}}}"
     echo $ML_MODEL | ./bin/kafka-console-producer.sh --topic ATLAS_HOOK 
--broker-list localhost:9092
   }
   
   function create_ml_pipeline(){
     NAME=$1
     DIR_TYPE=$2
     DIR_NAME=$3
     TIMESTAMP=$(($(date +%s%N)/1000000))
     
ML_PIPELINE="{\""type\"":\""ENTITY_CREATE_V2\"",\""user\"":\""test\"",\""entities\"":{\""entities\"":[{\""typeName\"":
     
\""spark_ml_pipeline\"",\""attributes\"":{\""qualifiedName\"":\""$NAME\"",\""name\"":\""$NAME\""},\""isIncomplete\"":
     
false,\""provenanceType\"":0,\""version\"":0,\""relationshipAttributes\"":{\""directory\"":{\""typeName\"":
     
\""$DIR_TYPE\"",\""uniqueAttributes\"":{\""qualifiedName\"":\""$DIR_NAME\""}}},\""proxy\"":false}]}}}"
     echo $ML_PIPELINE | ./bin/kafka-console-producer.sh --topic ATLAS_HOOK 
--broker-list localhost:9092
   }
   
   function create_hdfs_path(){
     NAME=$1
     TIMESTAMP=$(($(date +%s%N)/1000000))
     
HDFS_PATH="{\""version\"":{\""version\"":\""1.0.0\"",\""versionParts\"":[1]},\""msgCompressionKind\"":\""NONE\"",
     
\""msgSplitIdx\"":1,\""msgSplitCount\"":1,\""msgSourceIP\"":\""172.27.12.6\"",\""msgCreatedBy\"":\""test\"",
     
\""msgCreationTime\"":$TIMESTAMP,\""message\"":{\""type\"":\""ENTITY_CREATE_V2\"",\""user\"":\""test\"",
     
\""entities\"":{\""entities\"":[{\""typeName\"":\""hdfs_path\"",\""attributes\"":{\""path\"":\""$NAME\"",
     
\""qualifiedName\"":\""$NAME\"",\""clusterName\"":\""test\"",\""name\"":\""$NAME\""},\""isIncomplete\"":false,
     \""provenanceType\"":0,\""version\"":0,\""proxy\"":false}]}}}"
     echo $HDFS_PATH | ./bin/kafka-console-producer.sh --topic ATLAS_HOOK 
--broker-list localhost:9092
   }
   ```
   Cases below work fine for new relationship defs with `directory` name:
   ```
   create_ml_directory mldir
   create_ml_model model_with_mldir spark_ml_directory mldir
   
   
   create_hdfs_path path
   create_ml_model model_with_path hdfs_path path
   
   
   create_ml_model model_with_mldir hdfs_path path
   
   create_ml_model model_with_path spark_ml_directory mldir
   
   
   create_ml_directory mldir2
   create_ml_pipeline pipeline_with_mldir spark_ml_directory mldir2
   ```
   but the next case fails as described in the previous comment: 
   ```
   create_hdfs_path path2
   create_ml_pipeline pipeline_with_path hdfs_path path2
   ```
   
   **I think the best way to resolve this will be creating a new relationship 
using different name:**
   ```
       {
         "name": "spark_ml_model_dataset",
         "serviceType": "spark",
         "typeVersion": "1.0",
         "relationshipCategory": "AGGREGATION",
         "endDef1": {
           "type": "spark_ml_model",
           "name": "dataset",
           "isContainer": true,
           "cardinality": "SINGLE"
         },
         "endDef2": {
           "type": "DataSet",
           "name": "model",
           "isContainer": false,
           "cardinality": "SINGLE"
         },
         "propagateTags": "NONE"
       },
    ...
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] [atlas] vladhlinsky commented on issue #88: ATLAS-3640 Update 'spark_ml_model_ml_directory' and 'spark_ml_pipeline_ml_directory' relationship definitions

Reply via email to