Re: Tag propagation

Mandy Chessell Mon, 15 Jan 2018 03:13:31 -0800

Hello David,
I am not sure how many examples you need.  But here are a couple of 
patterns ...


When we have a cluster of entities that make up a logical collection of 
information - such as a NoteLog and its Notes nested inside (area 1) - and 
a classification applied to any one element needs to be propagated both up 
and down.  For example, making a note log confidential makes all the notes 
inside confidential and making any note confidential makes the note log 
confidential (but not all of the other notes inside - if the confidential 
note is deleted then the note log is no longer confidential).  We will see 
similar behaviours with the dependency relationships between nested 
locations in area 0.

A second example is where the relationship is showing physical 
dependencies between entities that need to be respected.  For example, the 
relationship between DataSet and DataStore (Area 2).   If a data set has a 
retention classification or criticality classification (area 4) then it 
needs to flow to underlying data stores.  If the underlying data stores 
have a confidence classifications then they should propagate to the 
DataSets.  We will see similar behaviours with the dependency 
relationships between server capabilities in area 0.

Make sense?

All the best
Mandy
___________________________________________
Mandy Chessell CBE FREng CEng FBCS
IBM Distinguished Engineer

Master Inventor
Member of the IBM Academy of Technology
Visiting Professor, Department of Computer Science, University of 
Sheffield

Email: [email protected]
LinkedIn: http://www.linkedin.com/pub/mandy-chessell/22/897/a49

Assistant: Janet Brooks - [email protected]



From:   David Radley <[email protected]>
To:     Mandy Chessell <[email protected]>
Cc:     [email protected], "Madhan Neethiraj" <[email protected]>, 
"Sarath Subramanian" <[email protected]>
Date:   15/01/2018 10:05
Subject:        Re: Tag propagation



Hi Mandy,

From what I recall, we discussed some scenarios that we felt Tag 

propagation would be useful. I think the use cases we are thinking of are 

now indicated by the model files that have "propagateTags" set. The 

examples include the semanticClassification and the 

"hbase_table_column_families" relationships. We had not identified any use 


cases we felt were important where BOTH would be useful for a 

relationship; so were thinking of removing that option. Do you have some 

relationships that require BOTH in the open types - it would be useful for 


me to understand why those relationships need BOTH, 

         many thanks , David. 





From:   Mandy Chessell/UK/IBM

To:     [email protected]

Cc:     David Radley <[email protected]>, atlas 

<[email protected]>, Sarath Subramanian <[email protected]>

Date:   14/01/2018 13:25

Subject:        Re: Tag propagation





Hello Madhan, David,

I would not wish to remove the option to have tag propagation flow in both 


directions.  Most metadata relationships are not hierarchical.  They are 

two-way and different situations will cause for different classifications 

to flow in each direction.  I do not remember the discussion on removing 

the BOTH open - but if I missed it I apologise.  What is the 

justification?



The enforcement of the classification's entity types should not prevent 

the propagation of the tag through an entity because it does not support a 


tag.  Down stream entities may support the tag and need it to be 

propagated to them.  We need to work through more scenarios because we 

also need a way to bound tag propagation :)



As an FYI, the OMRS API for classifications includes an origin attribute 

that lets us return classifications with an entity that are explicitly 

assigned or propagated to the entity.  Most callers will not care but some 


might.



All the best

Mandy

___________________________________________

Mandy Chessell CBE FREng CEng FBCS

IBM Distinguished Engineer



Master Inventor

Member of the IBM Academy of Technology

Visiting Professor, Department of Computer Science, University of 

Sheffield



Email: [email protected]

LinkedIn: 
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.linkedin.com_pub_mandy-2Dchessell_22_897_a49&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=DEupm0k8-ppAmw6rImSmuE_tc4KzDG1cSUr7Fo_5T8Q&m=MV8WpwCeyTCRAC4oi3DRaoJFApKNSb616dYZRjPJeHQ&s=mwcUNR2iBI0bGMavvkqpv0C0bel2iQVHMCYcdaHZtng&e=




Assistant: Janet Brooks - [email protected]









From:   Madhan Neethiraj <[email protected]>

To:     David Radley <[email protected]>, Sarath Subramanian 

<[email protected]>

Cc:     atlas <[email protected]>

Date:   13/01/2018 02:14

Subject:        Re: Tag propagation







David,



 



Sarath was working on tag-propagation, but had to take up tasks related to 


JanusGraph and others. He will be resuming tag-propagation work next week; 


this feature would be part of Atlas-1.0.0 release.



 



- lose BOTH - this is still in the code - I think we agreed we wanted to 

get rid of this. 

Agree.



 



- should honour the classification entitytypes - so that we do not get 

classifications applied to inappropriate entityTypes 

Perhaps we should stop the propagation at the entity where the 

classification is not applicable? I think it wouldn’t be correct to block 

a classification association to an entity if the classification is not 

applicable for a down-stream entity.



 



- There is the question about how the propagated classifications would 

look in the get entity rest API  - I suggest that they appear in the 

entities classification with a field indicating that they are derived (and 


hence not able to be removed by an entity update). 

I was thinking about a separate attribute, 

AtlasEntity.propagatedClassifications, for this. However, I think your 

suggestion of adding a field to AtlasClassification is a better one; with 

this approach no changes would be needed in applications that process 

classifications on an entity. How about we capture the guid of the source 

entity on which the classification is associated, 

AtlasClassification.sourceEntityGuid? If this value is null, then the 

classification is associated with the current entity directly.



 



- I would hope that Ranger would pick up these new propagated tags using 

the existing tag sync. 

Yes. With the approach detailed above, no changes would be needed in 

Ranger.



 



- I think you wanted the derived classifications to be picked up at query 

time. I also remember suggesting that we store the derived classifications 


in a derivedClassifiation property in the entity which would contain the 

list of derived classifications. Or we could store them as a new type of 

edge "propagated classification" edges to the real classification. I like 

the edge idea. 

To  enable queries like ‘get list of entities that are classified as PII’, 


it will be performant if each entity vertex has data about the propagated 

classifications as well, similar to entities having data on 

classifications directly associated with the entity currently. However, 

all the entities should directly reference a single instance of a 

classification, so that it will be easier to manage changes to 

classification attribute values. Sarath will send an update on the design 

choices later next week.



 



If we had the above, we could classify a Term as PSI, and use the semantic 


mapping to propagate the classifications to the hive column. The hive 

column would not pick up classifications defined in the area 3 model like 

"SpineObject", which is defined as only applying to "GlossaryTerm". 

Yes. This usecase should be covered by the design discussed above.



 



Thanks,



Madhan



 



From: David Radley <[email protected]>

Date: Thursday, January 11, 2018 at 8:52 AM

To: Madhan Neethiraj <[email protected]>

Cc: atlas <[email protected]>

Subject: Tag propagation



 



Hi Madhan, 

I have a look in the code - I was surprised that the tag propagation was 

not in. Is this something you are looking at in the near future? If not I 

may need to look into it. I suggest the tag propagation implementation 

should phase 1 should: 

- lose BOTH - this is still in the code - I think we agreed we wanted to 

get rid of this. 

- should honour the classification entitytypes - so that we do not get 

classifications applied to inappropriate entityTypes 

- There is the question about how the propagated classifications would 

look in the get entity rest API  - I suggest that they appear in the 

entities classification with a field indicating that they are derived (and 


hence not able to be removed by an entity update). 

- I would hope that Ranger would pick up these new propagated tags using 

the existing tag sync. 

- I think you wanted the derived classifications to be picked up at query 

time. I also remember suggesting that we store the derived classifications 


in a derivedClassifiation property in the entity which would contain the 

list of derived classifications. Or we could store them as a new type of 

edge "propagated classification" edges to the real classification. I like 

the edge idea. 



If we had the above, we could classify a Term as PSI, and use the semantic 


mapping to propagate the classifications to the hive column. The hive 

column would not pick up classifications defined in the area 3 model like 

"SpineObject", which is defined as only applying to "GlossaryTerm". 



What do you think?   all the best, David. 



Unless stated otherwise above:

IBM United Kingdom Limited - Registered in England and Wales with number 

741598. 

Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU













Unless stated otherwise above:

IBM United Kingdom Limited - Registered in England and Wales with number 

741598. 

Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Re: Tag propagation

Reply via email to