Re: Tag propagation

Mandy Chessell Mon, 15 Jan 2018 04:25:54 -0800

Hello David,
There is only on instance of a classification allowed on an entity.  A 
propagated classification can not override an explicitly set 
classification.  When it comes to managing conflicts, there is nothing 
special about propagated classifications.  A new entity, or a new 
classification to an entity, or a new relationship needs to be validated 
and if it is invalid then the update is rejected.  Because the model is 
distributed, then it is possible that updates in different servers may 
conflict and be discovered later as we synchronise metadata between 
members of the cohort.  These conflicts are reported through the OMRS 
Event Protocol and corrected though exception management processes.


In the example of the note log, and assuming we are using the 
confidentiality classification defined in area 4 which has a sliding scale 
of enums as you state, and the Notelog has an explicit classification of 
"internal use" then it would be invalid to add a note that has a higher 
value of the classification because the note log's classification is the 
high water mark for the note log.   So the request to add the confidential 
note would be rejected.  If the note log did not have any confidentiality 
classification then the confidential note could be added and 
classification propagation up the hierarchy would be in effect making the 
note log confidential.

The classifications of confidentiality, retention and criticality are 
defined as valid for entities that inherit from Referenceable.  This is 
not a recent change - see model 422.  I agree we need to systematically 
work through the scenarios.  That was the point of my original note on 
this topic.  The BOTH option was being removed based on thinking through 
only 2 use cases that were not representational of the governance 
requirements.   I came up with 2 counter-examples in a few minutes and I 
am sure there are more.  I have not found a case yet where the existing 
configuration does not work - but I am not confident I have been through 
all of the scenarios either. 

This function needs a proper design and community review to get it right.  


All the best
Mandy
___________________________________________
Mandy Chessell CBE FREng CEng FBCS
IBM Distinguished Engineer

Master Inventor
Member of the IBM Academy of Technology
Visiting Professor, Department of Computer Science, University of 
Sheffield

Email: [email protected]
LinkedIn: http://www.linkedin.com/pub/mandy-chessell/22/897/a49

Assistant: Janet Brooks - [email protected]



From:   David Radley/UK/IBM
To:     Mandy Chessell/UK/IBM@IBMGB
Cc:     [email protected]
Date:   15/01/2018 11:49
Subject:        Re: Tag propagation


Hi Mandy,
I think you use cases make sense.

For the first use case, I am not sure what the confidential classification 
is here - is it a classification that is shipped with the open types? I 
assume that confidentiality would be a classification that has an ordered 
set of enumerated values, like "no classification", "internal use", 
"confidential". In this case if a NoteEntry and a NoteLog had the 
confidentiality classification on but with different values - we would 
need to design for what happens;having BOTH on the Attached NoteLogEntry 
RelationshipDef does not seem sufficient. Maybe we have an implied 
escalation based on the enum order.
For the second case around dataset and datastore, I have the same concern 
- how do we determine what we should do when there are different levels of 
retention or criticality specified on each entity. 

I am also concerned for confidentiality, retention and criticality, I 
assume these classifications would be defined as being applicable to 
Referenceable or to any entitytype. I am not sure on which 
RelationshipDefs these would flow on, but there is a risk that they could 
inadvertently propagate more widely that we would like. I think it would 
be useful to understand all the open metadata tag proposed RelationshipDef 
tag propagations to know these use cases are reasonably addressed. I 
suspect we will want to associate classifications with relationshipDefs so 
that relationshipDefs can limit which classifications they propagate. 
There is also the idea that we may want to override the classifications 
that have been propagated on an individual entity. 

I suggest we need additional mechanisms in addition to BOTH PropagateTags 
on a relationshipdef for your use cases. 

  all the best, David. 






From:   Mandy Chessell <[email protected]>
To:     [email protected]
Cc:     "Madhan Neethiraj" <[email protected]>, "Sarath Subramanian" 
<[email protected]>
Date:   15/01/2018 11:12
Subject:        Re: Tag propagation



Hello David,

I am not sure how many examples you need.  But here are a couple of 

patterns ...



When we have a cluster of entities that make up a logical collection of 

information - such as a NoteLog and its Notes nested inside (area 1) - and 


a classification applied to any one element needs to be propagated both up 


and down.  For example, making a note log confidential makes all the notes 


inside confidential and making any note confidential makes the note log 

confidential (but not all of the other notes inside - if the confidential 

note is deleted then the note log is no longer confidential).  We will see 


similar behaviours with the dependency relationships between nested 

locations in area 0.



A second example is where the relationship is showing physical 

dependencies between entities that need to be respected.  For example, the 


relationship between DataSet and DataStore (Area 2).   If a data set has a 


retention classification or criticality classification (area 4) then it 

needs to flow to underlying data stores.  If the underlying data stores 

have a confidence classifications then they should propagate to the 

DataSets.  We will see similar behaviours with the dependency 

relationships between server capabilities in area 0.



Make sense?



All the best

Mandy

___________________________________________

Mandy Chessell CBE FREng CEng FBCS

IBM Distinguished Engineer



Master Inventor

Member of the IBM Academy of Technology

Visiting Professor, Department of Computer Science, University of 

Sheffield



Email: [email protected]

LinkedIn: 
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.linkedin.com_pub_mandy-2Dchessell_22_897_a49&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=QhpUQPr5YlG95aAgCvZGStEXHg4hBbSYQ9JkRqR_svY&m=7nnEh29Xf_0tbQKuwQqj6Go9NNtkRhb2FPFwEMZCTtI&s=Z2PUY9QDU8hrSlXgtDVkeEGNomcasSHW48iWg4_voq4&e=




Assistant: Janet Brooks - [email protected]







From:   David Radley <[email protected]>

To:     Mandy Chessell <[email protected]>

Cc:     [email protected], "Madhan Neethiraj" <[email protected]>, 

"Sarath Subramanian" <[email protected]>

Date:   15/01/2018 10:05

Subject:        Re: Tag propagation







Hi Mandy,



From what I recall, we discussed some scenarios that we felt Tag 



propagation would be useful. I think the use cases we are thinking of are 



now indicated by the model files that have "propagateTags" set. The 



examples include the semanticClassification and the 



"hbase_table_column_families" relationships. We had not identified any use 






cases we felt were important where BOTH would be useful for a 



relationship; so were thinking of removing that option. Do you have some 



relationships that require BOTH in the open types - it would be useful for 






me to understand why those relationships need BOTH, 



         many thanks , David. 











From:   Mandy Chessell/UK/IBM



To:     [email protected]



Cc:     David Radley <[email protected]>, atlas 



<[email protected]>, Sarath Subramanian <[email protected]>



Date:   14/01/2018 13:25



Subject:        Re: Tag propagation











Hello Madhan, David,



I would not wish to remove the option to have tag propagation flow in both 






directions.  Most metadata relationships are not hierarchical.  They are 



two-way and different situations will cause for different classifications 



to flow in each direction.  I do not remember the discussion on removing 



the BOTH open - but if I missed it I apologise.  What is the 



justification?







The enforcement of the classification's entity types should not prevent 



the propagation of the tag through an entity because it does not support a 






tag.  Down stream entities may support the tag and need it to be 



propagated to them.  We need to work through more scenarios because we 



also need a way to bound tag propagation :)







As an FYI, the OMRS API for classifications includes an origin attribute 



that lets us return classifications with an entity that are explicitly 



assigned or propagated to the entity.  Most callers will not care but some 






might.







All the best



Mandy



___________________________________________



Mandy Chessell CBE FREng CEng FBCS



IBM Distinguished Engineer







Master Inventor



Member of the IBM Academy of Technology



Visiting Professor, Department of Computer Science, University of 



Sheffield







Email: [email protected]



LinkedIn: 

https://urldefense.proofpoint.com/v2/url?u=http-3A__www.linkedin.com_pub_mandy-2Dchessell_22_897_a49&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=DEupm0k8-ppAmw6rImSmuE_tc4KzDG1cSUr7Fo_5T8Q&m=MV8WpwCeyTCRAC4oi3DRaoJFApKNSb616dYZRjPJeHQ&s=mwcUNR2iBI0bGMavvkqpv0C0bel2iQVHMCYcdaHZtng&e=










Assistant: Janet Brooks - [email protected]



















From:   Madhan Neethiraj <[email protected]>



To:     David Radley <[email protected]>, Sarath Subramanian 



<[email protected]>



Cc:     atlas <[email protected]>



Date:   13/01/2018 02:14



Subject:        Re: Tag propagation















David,







 







Sarath was working on tag-propagation, but had to take up tasks related to 






JanusGraph and others. He will be resuming tag-propagation work next week; 






this feature would be part of Atlas-1.0.0 release.







 







- lose BOTH - this is still in the code - I think we agreed we wanted to 



get rid of this. 



Agree.







 







- should honour the classification entitytypes - so that we do not get 



classifications applied to inappropriate entityTypes 



Perhaps we should stop the propagation at the entity where the 



classification is not applicable? I think it wouldn’t be correct to block 



a classification association to an entity if the classification is not 



applicable for a down-stream entity.







 







- There is the question about how the propagated classifications would 



look in the get entity rest API  - I suggest that they appear in the 



entities classification with a field indicating that they are derived (and 






hence not able to be removed by an entity update). 



I was thinking about a separate attribute, 



AtlasEntity.propagatedClassifications, for this. However, I think your 



suggestion of adding a field to AtlasClassification is a better one; with 



this approach no changes would be needed in applications that process 



classifications on an entity. How about we capture the guid of the source 



entity on which the classification is associated, 



AtlasClassification.sourceEntityGuid? If this value is null, then the 



classification is associated with the current entity directly.







 







- I would hope that Ranger would pick up these new propagated tags using 



the existing tag sync. 



Yes. With the approach detailed above, no changes would be needed in 



Ranger.







 







- I think you wanted the derived classifications to be picked up at query 



time. I also remember suggesting that we store the derived classifications 






in a derivedClassifiation property in the entity which would contain the 



list of derived classifications. Or we could store them as a new type of 



edge "propagated classification" edges to the real classification. I like 



the edge idea. 



To  enable queries like ‘get list of entities that are classified as PII’, 






it will be performant if each entity vertex has data about the propagated 



classifications as well, similar to entities having data on 



classifications directly associated with the entity currently. However, 



all the entities should directly reference a single instance of a 



classification, so that it will be easier to manage changes to 



classification attribute values. Sarath will send an update on the design 



choices later next week.







 







If we had the above, we could classify a Term as PSI, and use the semantic 






mapping to propagate the classifications to the hive column. The hive 



column would not pick up classifications defined in the area 3 model like 



"SpineObject", which is defined as only applying to "GlossaryTerm". 



Yes. This usecase should be covered by the design discussed above.







 







Thanks,







Madhan







 







From: David Radley <[email protected]>



Date: Thursday, January 11, 2018 at 8:52 AM



To: Madhan Neethiraj <[email protected]>



Cc: atlas <[email protected]>



Subject: Tag propagation







 







Hi Madhan, 



I have a look in the code - I was surprised that the tag propagation was 



not in. Is this something you are looking at in the near future? If not I 



may need to look into it. I suggest the tag propagation implementation 



should phase 1 should: 



- lose BOTH - this is still in the code - I think we agreed we wanted to 



get rid of this. 



- should honour the classification entitytypes - so that we do not get 



classifications applied to inappropriate entityTypes 



- There is the question about how the propagated classifications would 



look in the get entity rest API  - I suggest that they appear in the 



entities classification with a field indicating that they are derived (and 






hence not able to be removed by an entity update). 



- I would hope that Ranger would pick up these new propagated tags using 



the existing tag sync. 



- I think you wanted the derived classifications to be picked up at query 



time. I also remember suggesting that we store the derived classifications 






in a derivedClassifiation property in the entity which would contain the 



list of derived classifications. Or we could store them as a new type of 



edge "propagated classification" edges to the real classification. I like 



the edge idea. 







If we had the above, we could classify a Term as PSI, and use the semantic 






mapping to propagate the classifications to the hive column. The hive 



column would not pick up classifications defined in the area 3 model like 



"SpineObject", which is defined as only applying to "GlossaryTerm". 







What do you think?   all the best, David. 







Unless stated otherwise above:



IBM United Kingdom Limited - Registered in England and Wales with number 



741598. 



Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU



























Unless stated otherwise above:



IBM United Kingdom Limited - Registered in England and Wales with number 



741598. 



Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Re: Tag propagation

Reply via email to