vladhlinsky commented on issue #88: ATLAS-3640 Update 'spark_ml_model_ml_directory' and 'spark_ml_pipeline_ml_directory' relationship definitions URL: https://github.com/apache/atlas/pull/88#issuecomment-592703591 I created the following functions in order to test proposed changes without Spark Atlas Connector: ``` function create_ml_directory(){ NAME=$1 TIMESTAMP=$(($(date +%s%N)/1000000)) ML_DIR="{\""version\"":{\""version\"":\""1.0.0\"",\""versionParts\"":[1]},\""msgCompressionKind\"":\""NONE\"", \""msgSplitIdx\"":1,\""msgSplitCount\"":1,\""msgSourceIP\"":\""172.27.12.6\"",\""msgCreatedBy\"":\""test\"", \""msgCreationTime\"":$TIMESTAMP,\""message\"":{\""type\"":\""ENTITY_CREATE_V2\"",\""user\"":\""test\"", \""entities\"":{\""entities\"":[{\""typeName\"":\""spark_ml_directory\"",\""attributes\"": {\""qualifiedName\"":\""$NAME\"",\""name\"":\""$NAME\"",\""uri\"":\""hdfs://\"",\""directory\"":\""/test\""}, \""isIncomplete\"":false,\""provenanceType\"":0,\""version\"":0,\""proxy\"":false}]}}}" echo $ML_DIR | ./bin/kafka-console-producer.sh --topic ATLAS_HOOK --broker-list localhost:9092 } function create_ml_model(){ NAME=$1 DIR_TYPE=$2 DIR_NAME=$3 TIMESTAMP=$(($(date +%s%N)/1000000)) ML_MODEL="{\""version\"":{\""version\"":\""1.0.0\"",\""versionParts\"":[1]},\""msgCompressionKind\"":\""NONE\"", \""msgSplitIdx\"":1,\""msgSplitCount\"":1,\""msgSourceIP\"":\""172.27.12.6\"",\""msgCreatedBy\"":\""test\"", \""msgCreationTime\"":$TIMESTAMP,\""message\"":{\""type\"":\""ENTITY_CREATE_V2\"",\""user\"":\""test\"", \""entities\"":{\""entities\"":[{\""typeName\"":\""spark_ml_model\"",\""attributes\"": {\""qualifiedName\"":\""$NAME\"",\""name\"":\""$NAME\""},\""isIncomplete\"":false,\""provenanceType\"":0, \""version\"":0,\""relationshipAttributes\"":{\""directory\"":{\""typeName\"":\""$DIR_TYPE\"", \""uniqueAttributes\"":{\""qualifiedName\"":\""$DIR_NAME\""}}},\""proxy\"":false}]}}}" echo $ML_MODEL | ./bin/kafka-console-producer.sh --topic ATLAS_HOOK --broker-list localhost:9092 } function create_ml_pipeline(){ NAME=$1 DIR_TYPE=$2 DIR_NAME=$3 TIMESTAMP=$(($(date +%s%N)/1000000)) ML_PIPELINE="{\""type\"":\""ENTITY_CREATE_V2\"",\""user\"":\""test\"",\""entities\"":{\""entities\"":[{\""typeName\"": \""spark_ml_pipeline\"",\""attributes\"":{\""qualifiedName\"":\""$NAME\"",\""name\"":\""$NAME\""},\""isIncomplete\"": false,\""provenanceType\"":0,\""version\"":0,\""relationshipAttributes\"":{\""directory\"":{\""typeName\"": \""$DIR_TYPE\"",\""uniqueAttributes\"":{\""qualifiedName\"":\""$DIR_NAME\""}}},\""proxy\"":false}]}}}" echo $ML_PIPELINE | ./bin/kafka-console-producer.sh --topic ATLAS_HOOK --broker-list localhost:9092 } function create_hdfs_path(){ NAME=$1 TIMESTAMP=$(($(date +%s%N)/1000000)) HDFS_PATH="{\""version\"":{\""version\"":\""1.0.0\"",\""versionParts\"":[1]},\""msgCompressionKind\"":\""NONE\"", \""msgSplitIdx\"":1,\""msgSplitCount\"":1,\""msgSourceIP\"":\""172.27.12.6\"",\""msgCreatedBy\"":\""test\"", \""msgCreationTime\"":$TIMESTAMP,\""message\"":{\""type\"":\""ENTITY_CREATE_V2\"",\""user\"":\""test\"", \""entities\"":{\""entities\"":[{\""typeName\"":\""hdfs_path\"",\""attributes\"":{\""path\"":\""$NAME\"", \""qualifiedName\"":\""$NAME\"",\""clusterName\"":\""test\"",\""name\"":\""$NAME\""},\""isIncomplete\"":false, \""provenanceType\"":0,\""version\"":0,\""proxy\"":false}]}}}" echo $HDFS_PATH | ./bin/kafka-console-producer.sh --topic ATLAS_HOOK --broker-list localhost:9092 } ``` Cases below work fine for new relationship defs with `directory` name: ``` create_ml_directory mldir create_ml_model model_with_mldir spark_ml_directory mldir create_hdfs_path path create_ml_model model_with_path hdfs_path path create_ml_model model_with_mldir hdfs_path path create_ml_model model_with_path spark_ml_directory mldir create_ml_directory mldir2 create_ml_pipeline pipeline_with_mldir spark_ml_directory mldir2 ``` but the next case fails as described in the previous comment: ``` create_hdfs_path path2 create_ml_pipeline pipeline_with_path hdfs_path path2 ``` **I think the best way to resolve this will be creating a new relationship using different name:** ``` { "name": "spark_ml_model_dataset", "serviceType": "spark", "typeVersion": "1.0", "relationshipCategory": "AGGREGATION", "endDef1": { "type": "spark_ml_model", "name": "dataset", "isContainer": true, "cardinality": "SINGLE" }, "endDef2": { "type": "DataSet", "name": "model", "isContainer": false, "cardinality": "SINGLE" }, "propagateTags": "NONE" }, ... ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
