Vladislav Glinskiy created ATLAS-3646:
-----------------------------------------
Summary: Create new 'spark_ml_model_dataset' and
'spark_ml_pipeline_dataset' relationship definitions
Key: ATLAS-3646
URL: https://issues.apache.org/jira/browse/ATLAS-3646
Project: Atlas
Issue Type: Task
Reporter: Vladislav Glinskiy
Fix For: 2.1.0, 3.0.0
Create new 'spark_ml_model_dataset' and 'spark_ml_pipeline_dataset'
relationship definitions. This is required in order to integrate Spark Atlas
Connector's ML event processor.
Previously, Spark Atlas Connector used the 'spark_ml_directory' model for the
ML model directory and 'spark_ml_model_ml_directory',
'spark_ml_pipeline_ml_directory' relationship definitions. Usage of the
'spark_ml_directory' was reverted in the scope of
[https://github.com/hortonworks-spark/spark-atlas-connector/issues/61],
[https://github.com/hortonworks-spark/spark-atlas-connector/pull/62] so ML
model directory is 'DataSet' entity(i.e. 'hdfs_path', 'fs_path').
Thus, new relationship definitions must be created, since there is no
straightforward way to update existing ones to use 'DataSet' type instead of
it's child type 'spark_ml_directory'.
See:
* ATLAS-3640
* [https://github.com/apache/atlas/pull/88#issuecomment-592699723]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)