[ 
https://issues.apache.org/jira/browse/ATLAS-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Umesh Patil resolved ATLAS-4334.
--------------------------------
    Fix Version/s: 3.0.0
       Resolution: Fixed

*Fix Summary:*
We have fixed the issue where the same aws_s3_v2_directory entity in Atlas was 
being overwritten when creating new external Hive tables.

*What we did:*
1) Ensured that the correct JARs (hadoop-aws and aws-java-sdk-bundle) are 
placed in the Hive lib, so that S3A filesystem works.
2) Restarted the Hive container to pick up the new dependencies.
3) Verified table creation via Beeline using s3a//...paths.
4) Used the Atlas REST API to check that:
  • For *each* external table created, a *new* entity of type 
aws_s3_v2_directory is now created (instead of re‑using/overwriting).
  • The correct hive_table entities are being created and linked to the S3 
directories.
5) Validated lineage and metadata in Atlas:
  • Queried aws_s3_v2_directory types via REST to check qualifiedName.
  • Queried hive_table entities and confirmed correct qualifiedNames.



*Result:*
 * External table creation now correctly registers *multiple* separate 
aws_s3_v2_directory entities in Atlas (as expected).

 * No more overwriting of existing S3 directory entities when new external 
tables/databases are created.

*Tested On:*
 * Local Atlas (port 21000)

 * Hive in Docker (atlas-hive container)

 * S3 bucker: atlas-hive-demp-bucker in ap-south-1


*Closing as Fixed for version 3.0.0.*

> Creating an external table/database, overrides the qualfiedName of an already 
> existing aws_s3_v2_directory type
> ---------------------------------------------------------------------------------------------------------------
>
>                 Key: ATLAS-4334
>                 URL: https://issues.apache.org/jira/browse/ATLAS-4334
>             Project: Atlas
>          Issue Type: Bug
>          Components:  atlas-core
>            Reporter: Umesh Padashetty
>            Assignee: Umesh Patil
>            Priority: Critical
>             Fix For: 3.0.0
>
>
> The expectation is that every time an EXTERNAL Table is created in Hive, 
> Atlas creates an entity of type hive_process connecting the 
> aws_s3_v2_directory and the hive_table
> A new unique entity of the type aws_s3_v2_directory and hive_table is created 
> in atlas for every new external table created.
> For instance, if I create an external table with the name test_ext_1, then 
> there is an aws_s3_v2_directory and hive_table entity created with a similar 
> name test_ext_1 
> But observing a strange behaviour where whenever a new external table is 
> created, even though a new hive_table entity is created, the previously 
> created aws_s3_v2_directory entity itself is getting overridden with the new 
> qualifiedName
> For instance, I ran the below queries 
>  * create external table test_ext_1(name string);
>  * create external table test_ext_2(name string);
>  * create external table test_ext_3(name string);
>  * create database net1;
> The expectation is that the above 4 queries will create
>  * 3 hive_table entities
>  * 1 hive_db entity
>  * 3 aws_s3_v2_directory entities 
> But it is actually creating
>  * 3 hive_table entities
>  * 1 hive_db entity
>  * 1 aws_s3_v2_directory entity
> The same aws_s3_v2_directory gets updated with a new qualifiedName every time 
> I create a new external table or a database 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to