[
https://issues.apache.org/jira/browse/ATLAS-535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15168154#comment-15168154
]
Suma Shivaprasad commented on ATLAS-535:
----------------------------------------
{noformat}
Modelling DELETE cascades across entities
Background
Currently, the Typesystem allows modelling relationship behaviour between types
as part of its attribute flags. The isComposite flag on an attribute defines
that the relation between the current type and the attribute Type (which is
annotated with the isComposite flag) have a “composition” relationship
indicating that the referred instance needs to be loaded, deleted whenever the
current instance is loaded/deleted. For eg: hive_table.columns has an
isComposite relationship and whenever a table is loade/deletd , the columns are
also loaded/deleted.
API changes
deleteEntity API should have another flag to indicate cascading deletes
Modelling/Repository changes
Option 1:
Add an attribute array<hive_table> in hive_db
Pros:
Works OOB and does not need any code changes
Since the entity being deleted is also the source from which the delete cascade
begins i.e it is the parent entity, we know exactly which edges i.e the ones
with label __typeName.attributeName and vertices are to be deleted.
Cons:
The current support for adding such an attribute flag is limited in its
application in some cases. For eg: Database->Table , Table -> Partitions could
have issues since any add of a partition will require updating the Table entity
and add it to array<partition>s which could possibly have issues with scale .
If we take hourly, daily partitions as worst case over five years, it could
have around ~50000 - partition entries for a table. Not sure what can be an
average number of tables that we should support for a Database ?
Will have to implement another flag isVisible/lazyFetch on an attribute to not
load/display the tables or do a lazy fetch when a database is loaded since
this is more of an atribute added for internal reasons and should not be
displayed when a database is viewed. If we add a lazyFetch, should we load all
the entries in the array ?
Option 2:
Add an attribute flag called isInverseComposite on hive_table.db.
In this case,
whenever an instance of hive_db needs to be deleted, it needs to look at all
the incoming vertices with edge label starting with __hive_db, look at their
type definition and check if isInverseComposite flag is set on them for the
current type attribute. If set, then remove the corresponding vertices and
edges
Get or update behaviour does not change/affected based on this flag.
Pros:
Simple approach and doesnt need intrusive code changes
Cons:
An additional flag that users need to define in the type definition.
Need to iterate over all the edges( which could be potentially large and check
which ones have the labels starting with that typeName prefix). However, on an
average there could be mostly one or maximum two such attributes which have a
potentially large number of edges and hence the scan would anyways mostly go
through all the vertices that need to be deleted.
Option 3:
There is no way currently to model associations between any two types/classes.
The proposal is to model this in a generic way as to be able to represent
various association rules between types which are not attribute specific . For
eg: Database to Table is a composition relationship.
Define a generic new internal type
AssociationRule
attributes:
String targetType // the type which which the association rule is being
defined
String name // the name of this Rule
Note: Typesystem will enforce a typecheck on the targetType using existing
types.
A type definition will have a Collection<AssociationRule> along with the
existing attribute definitions, traits etc
CascadeRule extends AssociationRule
DeleteCascadeRule extends CascadeRule
Currently the only Cascade type supported is DELETE
However going forward it could be extended later to varous other types like
the JPA cascade types - for updates, gets etc -
https://docs.oracle.com/javaee/6/api/javax/persistence/CascadeType.html
Also going forward AssociationRule(s) could be attached at an attribute level
i.e isComposite on an attribute can be changed to be a DeleteCascade rule
instead. So the same set of association rules can apply at both the type ,
attribute levels.
When a delete with cascade is issued on an entity, if its corresponding type
contains a DeleteCascadeRule, delete any references from this entity which are
of the targetType for eg: when an entity of hive_db is deleted, it will delete
all the hive_table entities associated with it. In order to find the vertices
to delete, it will follow all edges starting with the typeName
__hive_table(targetType) and delete the referred vertices. This should work for
all the complex and collection types - array, map, struct and class
references.
Pros:
Generic and can be used to define any associations between two types and use
them in any aspect of ATLAS eg: during entity mutation - updates, gets, delete
behaviour etc.
the current hive model of Table-> Database reference will not need a change
which means that there are no extra updates whenever a table is added which was
the case in Option 1.
Cons:
Is more intrusive and will need changes in type system apart from entity
mutation.
Need to iterate over all the edges( which could be potentially large and check
which ones have the labels starting with that typeName prefix). However, on an
average there could be mostly one or maximum two such attributes which have a
potentially large number of edges and hence the scan would anyways mostly go
through all the vertices that need to be deleted. Also deletes in general could
be a less used operation than creates/updates.
Due to its simplicity and non-intrusive code changes, leaning towards Option 2.
Thoughts?
{noformat}
> Support delete cascade efficently
> ---------------------------------
>
> Key: ATLAS-535
> URL: https://issues.apache.org/jira/browse/ATLAS-535
> Project: Atlas
> Issue Type: Sub-task
> Reporter: Suma Shivaprasad
> Fix For: 0.7-incubating
>
>
> Currently there are some limitation in the typesystem and modelling to
> support delete cascades at scale through the isComposite flag
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)