David,

 

Sarath was working on tag-propagation, but had to take up tasks related to 
JanusGraph and others. He will be resuming tag-propagation work next week; this 
feature would be part of Atlas-1.0.0 release.

 

- lose BOTH - this is still in the code - I think we agreed we wanted to get 
rid of this. 
Agree.

 

- should honour the classification entitytypes - so that we do not get 
classifications applied to inappropriate entityTypes   
Perhaps we should stop the propagation at the entity where the classification 
is not applicable? I think it wouldn’t be correct to block a classification 
association to an entity if the classification is not applicable for a 
down-stream entity.

 

- There is the question about how the propagated classifications would look in 
the get entity rest API  - I suggest that they appear in the entities 
classification with a field indicating that they are derived (and hence not 
able to be removed by an entity update). 
I was thinking about a separate attribute, 
AtlasEntity.propagatedClassifications, for this. However, I think your 
suggestion of adding a field to AtlasClassification is a better one; with this 
approach no changes would be needed in applications that process 
classifications on an entity. How about we capture the guid of the source 
entity on which the classification is associated, 
AtlasClassification.sourceEntityGuid? If this value is null, then the 
classification is associated with the current entity directly.

 

- I would hope that Ranger would pick up these new propagated tags using the 
existing tag sync. 
Yes. With the approach detailed above, no changes would be needed in Ranger.

 

- I think you wanted the derived classifications to be picked up at query time. 
I also remember suggesting that we store the derived classifications in a 
derivedClassifiation property in the entity which would contain the list of 
derived classifications. Or we could store them as a new type of edge 
"propagated classification" edges to the real classification. I like the edge 
idea. 
To  enable queries like ‘get list of entities that are classified as PII’, it 
will be performant if each entity vertex has data about the propagated 
classifications as well, similar to entities having data on classifications 
directly associated with the entity currently. However, all the entities should 
directly reference a single instance of a classification, so that it will be 
easier to manage changes to classification attribute values. Sarath will send 
an update on the design choices later next week.

 

If we had the above, we could classify a Term as PSI, and use the semantic 
mapping to propagate the classifications to the hive column. The hive column 
would not pick up classifications defined in the area 3 model like 
"SpineObject", which is defined as only applying to "GlossaryTerm".   
Yes. This usecase should be covered by the design discussed above.

 

Thanks,

Madhan

 

From: David Radley <[email protected]>
Date: Thursday, January 11, 2018 at 8:52 AM
To: Madhan Neethiraj <[email protected]>
Cc: atlas <[email protected]>
Subject: Tag propagation

 

Hi Madhan, 
I have a look in the code - I was surprised that the tag propagation was not 
in. Is this something you are looking at in the near future? If not I may need 
to look into it. I suggest the tag propagation implementation should phase 1 
should: 
- lose BOTH - this is still in the code - I think we agreed we wanted to get 
rid of this. 
- should honour the classification entitytypes - so that we do not get 
classifications applied to inappropriate entityTypes   
- There is the question about how the propagated classifications would look in 
the get entity rest API  - I suggest that they appear in the entities 
classification with a field indicating that they are derived (and hence not 
able to be removed by an entity update). 
- I would hope that Ranger would pick up these new propagated tags using the 
existing tag sync. 
- I think you wanted the derived classifications to be picked up at query time. 
I also remember suggesting that we store the derived classifications in a 
derivedClassifiation property in the entity which would contain the list of 
derived classifications. Or we could store them as a new type of edge 
"propagated classification" edges to the real classification. I like the edge 
idea. 

If we had the above, we could classify a Term as PSI, and use the semantic 
mapping to propagate the classifications to the hive column. The hive column 
would not pick up classifications defined in the area 3 model like 
"SpineObject", which is defined as only applying to "GlossaryTerm".   

What do you think?   all the best, David. 

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU


Reply via email to