Hello David, There is only on instance of a classification allowed on an entity. A propagated classification can not override an explicitly set classification. When it comes to managing conflicts, there is nothing special about propagated classifications. A new entity, or a new classification to an entity, or a new relationship needs to be validated and if it is invalid then the update is rejected. Because the model is distributed, then it is possible that updates in different servers may conflict and be discovered later as we synchronise metadata between members of the cohort. These conflicts are reported through the OMRS Event Protocol and corrected though exception management processes.
In the example of the note log, and assuming we are using the confidentiality classification defined in area 4 which has a sliding scale of enums as you state, and the Notelog has an explicit classification of "internal use" then it would be invalid to add a note that has a higher value of the classification because the note log's classification is the high water mark for the note log. So the request to add the confidential note would be rejected. If the note log did not have any confidentiality classification then the confidential note could be added and classification propagation up the hierarchy would be in effect making the note log confidential. The classifications of confidentiality, retention and criticality are defined as valid for entities that inherit from Referenceable. This is not a recent change - see model 422. I agree we need to systematically work through the scenarios. That was the point of my original note on this topic. The BOTH option was being removed based on thinking through only 2 use cases that were not representational of the governance requirements. I came up with 2 counter-examples in a few minutes and I am sure there are more. I have not found a case yet where the existing configuration does not work - but I am not confident I have been through all of the scenarios either. This function needs a proper design and community review to get it right. All the best Mandy ___________________________________________ Mandy Chessell CBE FREng CEng FBCS IBM Distinguished Engineer Master Inventor Member of the IBM Academy of Technology Visiting Professor, Department of Computer Science, University of Sheffield Email: [email protected] LinkedIn: http://www.linkedin.com/pub/mandy-chessell/22/897/a49 Assistant: Janet Brooks - [email protected] From: David Radley/UK/IBM To: Mandy Chessell/UK/IBM@IBMGB Cc: [email protected] Date: 15/01/2018 11:49 Subject: Re: Tag propagation Hi Mandy, I think you use cases make sense. For the first use case, I am not sure what the confidential classification is here - is it a classification that is shipped with the open types? I assume that confidentiality would be a classification that has an ordered set of enumerated values, like "no classification", "internal use", "confidential". In this case if a NoteEntry and a NoteLog had the confidentiality classification on but with different values - we would need to design for what happens;having BOTH on the Attached NoteLogEntry RelationshipDef does not seem sufficient. Maybe we have an implied escalation based on the enum order. For the second case around dataset and datastore, I have the same concern - how do we determine what we should do when there are different levels of retention or criticality specified on each entity. I am also concerned for confidentiality, retention and criticality, I assume these classifications would be defined as being applicable to Referenceable or to any entitytype. I am not sure on which RelationshipDefs these would flow on, but there is a risk that they could inadvertently propagate more widely that we would like. I think it would be useful to understand all the open metadata tag proposed RelationshipDef tag propagations to know these use cases are reasonably addressed. I suspect we will want to associate classifications with relationshipDefs so that relationshipDefs can limit which classifications they propagate. There is also the idea that we may want to override the classifications that have been propagated on an individual entity. I suggest we need additional mechanisms in addition to BOTH PropagateTags on a relationshipdef for your use cases. all the best, David. From: Mandy Chessell <[email protected]> To: [email protected] Cc: "Madhan Neethiraj" <[email protected]>, "Sarath Subramanian" <[email protected]> Date: 15/01/2018 11:12 Subject: Re: Tag propagation Hello David, I am not sure how many examples you need. But here are a couple of patterns ... When we have a cluster of entities that make up a logical collection of information - such as a NoteLog and its Notes nested inside (area 1) - and a classification applied to any one element needs to be propagated both up and down. For example, making a note log confidential makes all the notes inside confidential and making any note confidential makes the note log confidential (but not all of the other notes inside - if the confidential note is deleted then the note log is no longer confidential). We will see similar behaviours with the dependency relationships between nested locations in area 0. A second example is where the relationship is showing physical dependencies between entities that need to be respected. For example, the relationship between DataSet and DataStore (Area 2). If a data set has a retention classification or criticality classification (area 4) then it needs to flow to underlying data stores. If the underlying data stores have a confidence classifications then they should propagate to the DataSets. We will see similar behaviours with the dependency relationships between server capabilities in area 0. Make sense? All the best Mandy ___________________________________________ Mandy Chessell CBE FREng CEng FBCS IBM Distinguished Engineer Master Inventor Member of the IBM Academy of Technology Visiting Professor, Department of Computer Science, University of Sheffield Email: [email protected] LinkedIn: https://urldefense.proofpoint.com/v2/url?u=http-3A__www.linkedin.com_pub_mandy-2Dchessell_22_897_a49&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=QhpUQPr5YlG95aAgCvZGStEXHg4hBbSYQ9JkRqR_svY&m=7nnEh29Xf_0tbQKuwQqj6Go9NNtkRhb2FPFwEMZCTtI&s=Z2PUY9QDU8hrSlXgtDVkeEGNomcasSHW48iWg4_voq4&e= Assistant: Janet Brooks - [email protected] From: David Radley <[email protected]> To: Mandy Chessell <[email protected]> Cc: [email protected], "Madhan Neethiraj" <[email protected]>, "Sarath Subramanian" <[email protected]> Date: 15/01/2018 10:05 Subject: Re: Tag propagation Hi Mandy, From what I recall, we discussed some scenarios that we felt Tag propagation would be useful. I think the use cases we are thinking of are now indicated by the model files that have "propagateTags" set. The examples include the semanticClassification and the "hbase_table_column_families" relationships. We had not identified any use cases we felt were important where BOTH would be useful for a relationship; so were thinking of removing that option. Do you have some relationships that require BOTH in the open types - it would be useful for me to understand why those relationships need BOTH, many thanks , David. From: Mandy Chessell/UK/IBM To: [email protected] Cc: David Radley <[email protected]>, atlas <[email protected]>, Sarath Subramanian <[email protected]> Date: 14/01/2018 13:25 Subject: Re: Tag propagation Hello Madhan, David, I would not wish to remove the option to have tag propagation flow in both directions. Most metadata relationships are not hierarchical. They are two-way and different situations will cause for different classifications to flow in each direction. I do not remember the discussion on removing the BOTH open - but if I missed it I apologise. What is the justification? The enforcement of the classification's entity types should not prevent the propagation of the tag through an entity because it does not support a tag. Down stream entities may support the tag and need it to be propagated to them. We need to work through more scenarios because we also need a way to bound tag propagation :) As an FYI, the OMRS API for classifications includes an origin attribute that lets us return classifications with an entity that are explicitly assigned or propagated to the entity. Most callers will not care but some might. All the best Mandy ___________________________________________ Mandy Chessell CBE FREng CEng FBCS IBM Distinguished Engineer Master Inventor Member of the IBM Academy of Technology Visiting Professor, Department of Computer Science, University of Sheffield Email: [email protected] LinkedIn: https://urldefense.proofpoint.com/v2/url?u=http-3A__www.linkedin.com_pub_mandy-2Dchessell_22_897_a49&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=DEupm0k8-ppAmw6rImSmuE_tc4KzDG1cSUr7Fo_5T8Q&m=MV8WpwCeyTCRAC4oi3DRaoJFApKNSb616dYZRjPJeHQ&s=mwcUNR2iBI0bGMavvkqpv0C0bel2iQVHMCYcdaHZtng&e= Assistant: Janet Brooks - [email protected] From: Madhan Neethiraj <[email protected]> To: David Radley <[email protected]>, Sarath Subramanian <[email protected]> Cc: atlas <[email protected]> Date: 13/01/2018 02:14 Subject: Re: Tag propagation David, Sarath was working on tag-propagation, but had to take up tasks related to JanusGraph and others. He will be resuming tag-propagation work next week; this feature would be part of Atlas-1.0.0 release. - lose BOTH - this is still in the code - I think we agreed we wanted to get rid of this. Agree. - should honour the classification entitytypes - so that we do not get classifications applied to inappropriate entityTypes Perhaps we should stop the propagation at the entity where the classification is not applicable? I think it wouldn’t be correct to block a classification association to an entity if the classification is not applicable for a down-stream entity. - There is the question about how the propagated classifications would look in the get entity rest API - I suggest that they appear in the entities classification with a field indicating that they are derived (and hence not able to be removed by an entity update). I was thinking about a separate attribute, AtlasEntity.propagatedClassifications, for this. However, I think your suggestion of adding a field to AtlasClassification is a better one; with this approach no changes would be needed in applications that process classifications on an entity. How about we capture the guid of the source entity on which the classification is associated, AtlasClassification.sourceEntityGuid? If this value is null, then the classification is associated with the current entity directly. - I would hope that Ranger would pick up these new propagated tags using the existing tag sync. Yes. With the approach detailed above, no changes would be needed in Ranger. - I think you wanted the derived classifications to be picked up at query time. I also remember suggesting that we store the derived classifications in a derivedClassifiation property in the entity which would contain the list of derived classifications. Or we could store them as a new type of edge "propagated classification" edges to the real classification. I like the edge idea. To enable queries like ‘get list of entities that are classified as PII’, it will be performant if each entity vertex has data about the propagated classifications as well, similar to entities having data on classifications directly associated with the entity currently. However, all the entities should directly reference a single instance of a classification, so that it will be easier to manage changes to classification attribute values. Sarath will send an update on the design choices later next week. If we had the above, we could classify a Term as PSI, and use the semantic mapping to propagate the classifications to the hive column. The hive column would not pick up classifications defined in the area 3 model like "SpineObject", which is defined as only applying to "GlossaryTerm". Yes. This usecase should be covered by the design discussed above. Thanks, Madhan From: David Radley <[email protected]> Date: Thursday, January 11, 2018 at 8:52 AM To: Madhan Neethiraj <[email protected]> Cc: atlas <[email protected]> Subject: Tag propagation Hi Madhan, I have a look in the code - I was surprised that the tag propagation was not in. Is this something you are looking at in the near future? If not I may need to look into it. I suggest the tag propagation implementation should phase 1 should: - lose BOTH - this is still in the code - I think we agreed we wanted to get rid of this. - should honour the classification entitytypes - so that we do not get classifications applied to inappropriate entityTypes - There is the question about how the propagated classifications would look in the get entity rest API - I suggest that they appear in the entities classification with a field indicating that they are derived (and hence not able to be removed by an entity update). - I would hope that Ranger would pick up these new propagated tags using the existing tag sync. - I think you wanted the derived classifications to be picked up at query time. I also remember suggesting that we store the derived classifications in a derivedClassifiation property in the entity which would contain the list of derived classifications. Or we could store them as a new type of edge "propagated classification" edges to the real classification. I like the edge idea. If we had the above, we could classify a Term as PSI, and use the semantic mapping to propagate the classifications to the hive column. The hive column would not pick up classifications defined in the area 3 model like "SpineObject", which is defined as only applying to "GlossaryTerm". What do you think? all the best, David. Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
