vladhlinsky commented on issue #88: ATLAS-3640 Update 
'spark_ml_model_ml_directory' and 'spark_ml_pipeline_ml_directory' relationship 
definitions
URL: https://github.com/apache/atlas/pull/88#issuecomment-592699723
 
 
   Thanks, @HeartSaVioR!
   
   As it turned out, the proposed changes do not work correctly for an upgrade 
to Spark models 1.1. 
   I tested the changes only for a new installation(with no existing entities 
in the HBase).
   
   I guess `relationshipCategory` can be updated only via a patch. An attempt 
to upgrade with proposed changes leads to:
   ```
   2020-02-28 09:57:11,783 ERROR - [main:] ~ graph rollback due to exception  
(GraphTransactionInterceptor:167)
   org.apache.atlas.exception.AtlasBaseException: invalid  update for 
relationship spark_ml_model_ml_directory: new relationshipDef category 
AGGREGATION, existing relationshipDef category COMPOSITION
           at 
org.apache.atlas.repository.store.graph.v2.AtlasRelationshipDefStoreV2.preUpdateCheck(AtlasRelationshipDefStoreV2.java:432)
   ```
   it's possible to use the following path to update this property
   ```
   {
       "patches": [
           {
               "id":              "TYPEDEF_PATCH_1000_015_001",
               "description":     "Update relationshipCategory to AGGREGATION",
               "action":          "REMOVE_LEGACY_REF_ATTRIBUTES",
               "typeName":        "spark_ml_model_ml_directory",
               "applyToVersion":  "1.0",
               "updateToVersion": "1.1",
               "params": {
                   "relationshipCategory": "AGGREGATION"
               }
           },
           ...
           }
       ]
   }
   
   ```
   however, there is no way to update `endDefs` types. I can not find a patch 
action for this purpose and an attempt to update it directly in the model file 
leads to:
   ```
   2020-02-28 12:14:05,151 INFO  - [main:] ~ GraphTransaction intercept for 
org.apache.atlas.repository.store.graph.v2.AtlasTypeDefGraphStoreV2.createUpdateTypesDef
 (GraphTransactionAdvisor$1:41)
   2020-02-28 12:14:05,213 ERROR - [main:] ~ graph rollback due to exception  
(GraphTransactionInterceptor:167)
   org.apache.atlas.exception.AtlasBaseException: invalid update for 
relationshipDef spark_ml_model_ml_directory: new end2 
AtlasRelationshipEndDef{type='DataSet', name==>'model', description==>'null', 
isContainer==>'false', cardinality==>'SINGLE', isLegacyAttribute==>'false'}, 
existing end2 AtlasRelationshipEndDef{type='spark_ml_directory', 
name==>'model', description==>'null', isContainer==>'false', 
cardinality==>'SINGLE', isLegacyAttribute==>'false'}
           at 
org.apache.atlas.repository.store.graph.v2.AtlasRelationshipDefStoreV2.preUpdateCheck(AtlasRelationshipDefStoreV2.java:457)
   
   ```
   
   Thus, it seems that the safest way to resolve this issue will be **creating 
a new relationship**.
   I tried to add the next relationship defs that use the same name `directory`:
   ```
       {
         "name": "spark_ml_model_dataset",
         "serviceType": "spark",
         "typeVersion": "1.0",
         "relationshipCategory": "AGGREGATION",
         "endDef1": {
           "type": "spark_ml_model",
           "name": "directory",
           "isContainer": true,
           "cardinality": "SINGLE"
         },
         "endDef2": {
           "type": "DataSet",
           "name": "model",
           "isContainer": false,
           "cardinality": "SINGLE"
         },
         "propagateTags": "NONE"
       },
       {
         "name": "spark_ml_pipeline_dataset",
         "serviceType": "spark",
         "typeVersion": "1.0",
         "relationshipCategory": "AGGREGATION",
         "endDef1": {
           "type": "spark_ml_pipeline",
           "name": "directory",
           "isContainer": true,
           "cardinality": "SINGLE"
         },
         "endDef2": {
           "type": "DataSet",
           "name": "pipeline",
           "isContainer": false,
           "cardinality": "SINGLE"
         },
         "propagateTags": "NONE"
       }
   ``` 
   
   and it works perfectly fine for the `spark_ml_model` but fails for the 
`spark_ml_pipeline` with the following error:
   ```
   2020-02-28 21:34:00,933 WARN  - [NotificationHookConsumer thread-0:] ~ Max 
retries exceeded for message 
{"version":{"version":"1.0.0","versionParts":[1]},"msgCompressionKind":"NONE","msgSplitIdx":1,"msgSplitCount":1,"msgCreationTime":1582918440918,"message":{"type":"ENTITY_CREATE_V2","user":"test","entities":{"entities":[{"typeName":"spark_ml_model","attributes":{"qualifiedName":"model_with_path8","name":"model_with_path8"},"guid":"-386799758271978","isIncomplete":false,"provenanceType":0,"version":0,"relationshipAttributes":{"directory":{"typeName":"hdfs_path","uniqueAttributes":{"qualifiedName":"path8"}}},"proxy":false}]}}}
 (NotificationHookConsumer$HookConsumer:793)
   org.apache.atlas.exception.AtlasBaseException: invalid relationshipDef: 
spark_ml_model_ml_directory: end type 1: spark_ml_directory, end type 2: 
spark_ml_model
        at 
org.apache.atlas.repository.store.graph.v2.AtlasRelationshipStoreV2.validateRelationship(AtlasRelationshipStoreV2.java:657)
   
   ```
   
   Debugging shows that 
[AtlasEntityUtil.getRelationshipType](https://github.com/apache/atlas/blob/master/repository/src/main/java/org/apache/atlas/repository/store/graph/v2/EntityGraphMapper.java#L659)
 returns `null` for the `hdfs_path`(which is child of `DataSet`) attribute and 
this makes 
[entityType.getRelationshipAttribute](https://github.com/apache/atlas/blob/master/intg/src/main/java/org/apache/atlas/type/AtlasEntityType.java#L459)
 return first value of HashMap. 
   
   In the case of `spark_ml_model` relation, it appears to be the right 
relation, but in the case of `spark_ml_pipeline` - the wrong one. See 
screenshots:
   ![Screenshot from 2020-02-28 
21-53-21](https://user-images.githubusercontent.com/61428392/75582735-e56c2f00-5a74-11ea-9123-4fe3bf33881c.png)
   ![Screenshot from 2020-02-28 
21-53-54](https://user-images.githubusercontent.com/61428392/75582748-eac97980-5a74-11ea-86de-a12633c0b0d6.png)
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to