[ 
https://issues.apache.org/jira/browse/ATLAS-1665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Mestry updated ATLAS-1665:
-----------------------------------
    Description: 
h5.Background

Existing implementation of Export API w.r.t ZIP file generation adds 1 *.json* 
file per entity. This makes ZIP file creation inefficient. The ZIP files are 
75% larger in size than what could be possible with fewer *.json* file entries.

h5.Solution

The implementation uses the new v2 API *AtlasEntityWithExtInfo* representation 
instead of *AtlasEntity*. This format combines an entity with related entities 
as one. E.g. *hive_table* will contain all the *hive_columns* that it is made 
up of. (See example section below.)

This results in significant reduction of generated *JSON* files. This impacts 
reduction in generated *ZIP* file.


  was:
h5.Background

Existing implementation of Export API w.r.t ZIP file generation adds 1 *.json* 
file per entity. This makes ZIP file creation inefficient. The ZIP files are 
75% larger in size than what could be possible with fewer *.json* file entries.

h5.Solution

The implementation uses the new v2 API *AtlasEntityWithExtInfo* representation 
instead of *AtlasEntity*. This format combines an entity with related entities 
as one. E.g. *hive_table* will contain all the *hive_columns* that it is made 
up of. (See example section below.)

This results in significant reduction of generated *JSON* files. This impacts 
reduction in generated *ZIP* file.

h5.Implementation Details

*Export API*
- Modified *Gremlin* used to fetch connected entities to return *guid* with 
*boolean* to indicate if the entity is process or not.
- _ExportService_ Modified implementation to fetch *AtlasEntityWithExtInfo* 
instead of *AtlasEntity*. Modified book keeping to save *process* (lineage) 
entities after all non-process entities are saved.
- _ZipSink_ Minor modification to serialize  *AtlasEntityWithExtInfo*.

*Import API*
- _ZipSource_ Modified to source *AtlasEntityWithExtInfo*.
- _EntityImportStream_ Modified to source *AtlasEntityWithExtInfo*.
- _AtlasEntityStreamForImport.getGuid_ Modified  to source requested entities 
first from stored *AtlasEntityWithExtInfo* object. Request from stream only if 
not found.
- _AtlasEntityStoreV1.bulkImport_ Minor modification to use the new changes to 
stream.




> Export API: Improve Generated ZIP File Using AtlasEntityWithExtInfo
> -------------------------------------------------------------------
>
>                 Key: ATLAS-1665
>                 URL: https://issues.apache.org/jira/browse/ATLAS-1665
>             Project: Atlas
>          Issue Type: Improvement
>          Components:  atlas-core
>    Affects Versions: 0.9-incubating
>            Reporter: Ashutosh Mestry
>            Assignee: Ashutosh Mestry
>             Fix For: trunk
>
>         Attachments: ATLAS-1665.patch
>
>
> h5.Background
> Existing implementation of Export API w.r.t ZIP file generation adds 1 
> *.json* file per entity. This makes ZIP file creation inefficient. The ZIP 
> files are 75% larger in size than what could be possible with fewer *.json* 
> file entries.
> h5.Solution
> The implementation uses the new v2 API *AtlasEntityWithExtInfo* 
> representation instead of *AtlasEntity*. This format combines an entity with 
> related entities as one. E.g. *hive_table* will contain all the 
> *hive_columns* that it is made up of. (See example section below.)
> This results in significant reduction of generated *JSON* files. This impacts 
> reduction in generated *ZIP* file.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to