Hi Madhan and Sarath, It occurs to me that we are introducing 2 new definitions around classifications that require the code to traverse around the graph. - classificationDefs now have entityTypes to restrict the entities that they can be applied to. This requires us to check entity and classification hierarchies to ensure that inherited entities and classifications abide by these restrictions. This is currently done in code in the AtlasClassificationType. One set of checks at classification add / update time and another when we try to add a classification to an entity. - tag propagation implementation is currently in review and looks to work out where tags should be propagated to using Gremlin TP2 queries. The current proposed query is neat around 10 lines long, but does not account for inheritance or entityType restrictions.
If we carry on with the current approach , we potentially need to implement checking down the graph in the type code and also in the Gremlin query. I wonder if we can have a consistent approach so we use gremlin queries in both scenarios or use code in both scenarios. I see a few options 1) Carry on as is , code for Classification entityTypes , TP2 query for tag propagation. The TP2 query may become much more complex as it will need to recurse around the classification types in the graph and the entity types in the graph as well as the instance graph. The entityTypes gremlin logic will need to match the entityTypes checking code logic. 2) Move all the logic to code, this should mean we work at TP3, may give us more flexibility to handle tag propagation overrides we will need at a later date 3) Move all navigation logic to gremlin queries, this is appealing as the graph engine then can optimize the queries. 4) Extend 3) to store (cache) some of the inherited states in the instance graph so a simpler query can be made. We could also extend this approach to store when a user overrides the default propagation. I know we have concerns with duplicating metadata. I wonder if we could split the properties in the vertices so there is a defined section and a derived / cached section, so it is obvious which properties might need re-calculating. Thoughts? all the best, David. Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
