[
https://issues.apache.org/jira/browse/ATLAS-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Umesh Patil resolved ATLAS-4334.
--------------------------------
Fix Version/s: 3.0.0
Resolution: Fixed
*Fix Summary:*
We have fixed the issue where the same aws_s3_v2_directory entity in Atlas was
being overwritten when creating new external Hive tables.
*What we did:*
1) Ensured that the correct JARs (hadoop-aws and aws-java-sdk-bundle) are
placed in the Hive lib, so that S3A filesystem works.
2) Restarted the Hive container to pick up the new dependencies.
3) Verified table creation via Beeline using s3a//...paths.
4) Used the Atlas REST API to check that:
• For *each* external table created, a *new* entity of type
aws_s3_v2_directory is now created (instead of re‑using/overwriting).
• The correct hive_table entities are being created and linked to the S3
directories.
5) Validated lineage and metadata in Atlas:
• Queried aws_s3_v2_directory types via REST to check qualifiedName.
• Queried hive_table entities and confirmed correct qualifiedNames.
*Result:*
* External table creation now correctly registers *multiple* separate
aws_s3_v2_directory entities in Atlas (as expected).
* No more overwriting of existing S3 directory entities when new external
tables/databases are created.
*Tested On:*
* Local Atlas (port 21000)
* Hive in Docker (atlas-hive container)
* S3 bucker: atlas-hive-demp-bucker in ap-south-1
*Closing as Fixed for version 3.0.0.*
> Creating an external table/database, overrides the qualfiedName of an already
> existing aws_s3_v2_directory type
> ---------------------------------------------------------------------------------------------------------------
>
> Key: ATLAS-4334
> URL: https://issues.apache.org/jira/browse/ATLAS-4334
> Project: Atlas
> Issue Type: Bug
> Components: atlas-core
> Reporter: Umesh Padashetty
> Assignee: Umesh Patil
> Priority: Critical
> Fix For: 3.0.0
>
>
> The expectation is that every time an EXTERNAL Table is created in Hive,
> Atlas creates an entity of type hive_process connecting the
> aws_s3_v2_directory and the hive_table
> A new unique entity of the type aws_s3_v2_directory and hive_table is created
> in atlas for every new external table created.
> For instance, if I create an external table with the name test_ext_1, then
> there is an aws_s3_v2_directory and hive_table entity created with a similar
> name test_ext_1
> But observing a strange behaviour where whenever a new external table is
> created, even though a new hive_table entity is created, the previously
> created aws_s3_v2_directory entity itself is getting overridden with the new
> qualifiedName
> For instance, I ran the below queries
> * create external table test_ext_1(name string);
> * create external table test_ext_2(name string);
> * create external table test_ext_3(name string);
> * create database net1;
> The expectation is that the above 4 queries will create
> * 3 hive_table entities
> * 1 hive_db entity
> * 3 aws_s3_v2_directory entities
> But it is actually creating
> * 3 hive_table entities
> * 1 hive_db entity
> * 1 aws_s3_v2_directory entity
> The same aws_s3_v2_directory gets updated with a new qualifiedName every time
> I create a new external table or a database
--
This message was sent by Atlassian Jira
(v8.20.10#820010)