[
https://issues.apache.org/jira/browse/ATLAS-2916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jiaqi Shan updated ATLAS-2916:
------------------------------
Description:
When creating a Hive process, ColumnEntity will update with the value of
TableEntity's AtlasObjectId. There is an example to show the differences
bwtween CurrentEntity and EntityInStore.
{panel:title=CurrentEntity}
AtlasEntity{AtlasStruct{typeName='hive_column', attributes=[owner:bi_sh,
qualifiedName:bi.yingbang.t@bdyf, name:t, comment:t, position:1, type:int,
table:{color:#d04437}AtlasObjectId\{guid='-17020754238878791',
typeName='hive_table',
uniqueAttributes={qualifiedName:b.yingbang@bdyf}}{color}]}guid='431c8847-8fd2-454d-b77a-19aeef0d6b9b',
status=null, createdBy='null', updatedBy='null', createTime=null,
updateTime=null, version=0, relationshipAttributes=[], classifications=[],
meanings=[]}
{panel}
{panel:title=EntityInStore}
AtlasEntity{AtlasStruct{typeName='hive_column', attributes=[owner:bi_sh,
qualifiedName:bi.yingbang.t@bdyf, name:t, description:null, comment:t,
position:1, type:int,
{color:#d04437}table:AtlasObjectId\{guid='da35aff2-9851-499d-99cf-f1fbafb6e92b',
typeName='hive_table',
uniqueAttributes={}}{color}]}guid='431c8847-8fd2-454d-b77a-19aeef0d6b9b',
status=ACTIVE, createdBy='bi_sh', updatedBy='bi_sh',
createTime=2018-10-09T11:26:51.685Z, updateTime=2018-10-09T11:26:51.685Z,
version=0, relationshipAttributes=[], classifications=[], meanings=[]}
{panel}
Actually there is no metadata changed in ColumnEntity, the difference of
table's AtlasObjectId is caused by Hive Hook setting a new guid for
TableEntity. So maybe it's not necessary to update Hive column entity in this
instance.
We propose to add a LRU cache to skip updating the same entitiy which was
sent in an earlier notification. But in situation deleting and re-creating the
entity with the same uniqueAttributes, this solution goes wrong.
Is there any other good solution to aviod this problem?
was:
When creating a Hive process, ColumnEntity will update with the value of
TableEntity's AtlasObjectId. There is an example to show the differences
bwtween CurrentEntity and EntityInStore.
{panel:title=CurrentEntity}
AtlasEntity{AtlasStruct{typeName='hive_column', attributes=[owner:bi_sh,
qualifiedName:bi.yingbang.t@bdyf, name:t, comment:t, position:1, type:int,
table:{color:#d04437}AtlasObjectId\{guid='-17020754238878791',
typeName='hive_table',
uniqueAttributes={qualifiedName:b.yingbang@bdyf}}{color}]}guid='431c8847-8fd2-454d-b77a-19aeef0d6b9b',
status=null, createdBy='null', updatedBy='null', createTime=null,
updateTime=null, version=0, relationshipAttributes=[], classifications=[],
meanings=[]}
{panel}
{panel:title=EntityInStore}
AtlasEntity{AtlasStruct{typeName='hive_column', attributes=[owner:bi_sh,
qualifiedName:bi.yingbang.t@bdyf, name:t, description:null, comment:t,
position:1, type:int,
{color:#d04437}table:AtlasObjectId\{guid='da35aff2-9851-499d-99cf-f1fbafb6e92b',
typeName='hive_table',
uniqueAttributes={}}{color}]}guid='431c8847-8fd2-454d-b77a-19aeef0d6b9b',
status=ACTIVE, createdBy='bi_sh', updatedBy='bi_sh',
createTime=2018-10-09T11:26:51.685Z, updateTime=2018-10-09T11:26:51.685Z,
version=0, relationshipAttributes=[], classifications=[], meanings=[]}
{panel}
Actually there is no metadata changed in ColumnEntity, the difference of
table's AtlasObjectId is caused by Hive Hook setting a new guid for
TableEntity. So I think maybe it's not necessary to update Hive column entity
in this instance.
We propose to add a LRU cache to skip updating the same entitiy which was
sent in an earlier notification. But in situation deleting and re-creating the
entity with the same uniqueAttributes, this solution goes wrong.
Is there any other good solution to aviod this problem?
> Unnecessary entity update causes by different AtlasObjectId
> -----------------------------------------------------------
>
> Key: ATLAS-2916
> URL: https://issues.apache.org/jira/browse/ATLAS-2916
> Project: Atlas
> Issue Type: Improvement
> Components: atlas-core
> Affects Versions: 1.0.0
> Reporter: Jiaqi Shan
> Priority: Minor
>
> When creating a Hive process, ColumnEntity will update with the value of
> TableEntity's AtlasObjectId. There is an example to show the differences
> bwtween CurrentEntity and EntityInStore.
> {panel:title=CurrentEntity}
> AtlasEntity{AtlasStruct{typeName='hive_column', attributes=[owner:bi_sh,
> qualifiedName:bi.yingbang.t@bdyf, name:t, comment:t, position:1, type:int,
> table:{color:#d04437}AtlasObjectId\{guid='-17020754238878791',
> typeName='hive_table',
> uniqueAttributes={qualifiedName:b.yingbang@bdyf}}{color}]}guid='431c8847-8fd2-454d-b77a-19aeef0d6b9b',
> status=null, createdBy='null', updatedBy='null', createTime=null,
> updateTime=null, version=0, relationshipAttributes=[], classifications=[],
> meanings=[]}
> {panel}
> {panel:title=EntityInStore}
> AtlasEntity{AtlasStruct{typeName='hive_column', attributes=[owner:bi_sh,
> qualifiedName:bi.yingbang.t@bdyf, name:t, description:null, comment:t,
> position:1, type:int,
> {color:#d04437}table:AtlasObjectId\{guid='da35aff2-9851-499d-99cf-f1fbafb6e92b',
> typeName='hive_table',
> uniqueAttributes={}}{color}]}guid='431c8847-8fd2-454d-b77a-19aeef0d6b9b',
> status=ACTIVE, createdBy='bi_sh', updatedBy='bi_sh',
> createTime=2018-10-09T11:26:51.685Z, updateTime=2018-10-09T11:26:51.685Z,
> version=0, relationshipAttributes=[], classifications=[], meanings=[]}
> {panel}
> Actually there is no metadata changed in ColumnEntity, the difference of
> table's AtlasObjectId is caused by Hive Hook setting a new guid for
> TableEntity. So maybe it's not necessary to update Hive column entity in this
> instance.
> We propose to add a LRU cache to skip updating the same entitiy which
> was sent in an earlier notification. But in situation deleting and
> re-creating the entity with the same uniqueAttributes, this solution goes
> wrong.
> Is there any other good solution to aviod this problem?
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)