Vladislav Glinskiy created ATLAS-3646:
-----------------------------------------

             Summary: Create new 'spark_ml_model_dataset' and 
'spark_ml_pipeline_dataset' relationship definitions
                 Key: ATLAS-3646
                 URL: https://issues.apache.org/jira/browse/ATLAS-3646
             Project: Atlas
          Issue Type: Task
            Reporter: Vladislav Glinskiy
             Fix For: 2.1.0, 3.0.0


Create new 'spark_ml_model_dataset' and 'spark_ml_pipeline_dataset' 
relationship definitions. This is required in order to integrate Spark Atlas 
Connector's ML event processor.

Previously, Spark Atlas Connector used the 'spark_ml_directory' model for the 
ML model directory and 'spark_ml_model_ml_directory', 
'spark_ml_pipeline_ml_directory' relationship definitions. Usage of the 
'spark_ml_directory'  was reverted in the scope of 
[https://github.com/hortonworks-spark/spark-atlas-connector/issues/61], 
[https://github.com/hortonworks-spark/spark-atlas-connector/pull/62] so ML 
model directory is 'DataSet' entity(i.e. 'hdfs_path', 'fs_path').

Thus, new relationship definitions must be created, since there is no 
straightforward way to update existing ones to use 'DataSet' type instead of 
it's child type 'spark_ml_directory'.

See:
 * ATLAS-3640
 * [https://github.com/apache/atlas/pull/88#issuecomment-592699723]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to