[
https://issues.apache.org/jira/browse/ATLAS-1665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ashutosh Mestry updated ATLAS-1665:
-----------------------------------
Description:
**Background**
==============
Existing implementation of Export API w.r.t ZIP file genration adds 1 *.json*
file per entitiy. This makes ZIP file creation inefficient. The ZIP files are
75% larger in size than what could be possible with fewer *.json* file entries.
**Solution**
============
The implementation uses the new v2 API *AtlasEntityWithExtInfo* representation
instead of *AtlasEntity*. This format combines an entity with related entities
as one. E.g. *hive_table* will contain all the *hive_columns* that it is made
up of. (See example section below.)
This results in significant reduction of generated *JSON* files. This impacts
reduction in generated *ZIP* file.
**Implementation Details**
==========================
*Export API*
- Modified *Gremlin* used to fetch connected entities to return *guid* with
*boolean* to indicate if the entity is process or not.
- _ExportService_ Modified implementation to fetch *AtlasEntityWithExtInfo*
instead of *AtlasEntity*. Modified book keeping to save *process* (lineage)
entities after all non-process entities are saved.
- _ZipSink_ Minor modification to serialize *AtlasEntityWithExtInfo*.
*Import API*
- _ZipSource_ Modified to source *AtlasEntityWithExtInfo*.
- _EntityImportStream_ Modified to source *AtlasEntityWithExtInfo*.
- _AtlasEntityStreamForImport.getGuid_ Modified to source requested entities
first from stored *AtlasEntityWithExtInfo* object. Request from stream only if
not found.
- _AtlasEntityStoreV1.bulkImport_ Minor modification to use the new changes to
stream.
> Export API: Improve Generated ZIP File Using AtlasEntityWithExtInfo
> -------------------------------------------------------------------
>
> Key: ATLAS-1665
> URL: https://issues.apache.org/jira/browse/ATLAS-1665
> Project: Atlas
> Issue Type: Improvement
> Components: atlas-core
> Affects Versions: 0.9-incubating
> Reporter: Ashutosh Mestry
> Assignee: Ashutosh Mestry
> Fix For: trunk
>
>
> **Background**
> ==============
> Existing implementation of Export API w.r.t ZIP file genration adds 1 *.json*
> file per entitiy. This makes ZIP file creation inefficient. The ZIP files are
> 75% larger in size than what could be possible with fewer *.json* file
> entries.
> **Solution**
> ============
> The implementation uses the new v2 API *AtlasEntityWithExtInfo*
> representation instead of *AtlasEntity*. This format combines an entity with
> related entities as one. E.g. *hive_table* will contain all the
> *hive_columns* that it is made up of. (See example section below.)
> This results in significant reduction of generated *JSON* files. This impacts
> reduction in generated *ZIP* file.
> **Implementation Details**
> ==========================
> *Export API*
> - Modified *Gremlin* used to fetch connected entities to return *guid* with
> *boolean* to indicate if the entity is process or not.
> - _ExportService_ Modified implementation to fetch *AtlasEntityWithExtInfo*
> instead of *AtlasEntity*. Modified book keeping to save *process* (lineage)
> entities after all non-process entities are saved.
> - _ZipSink_ Minor modification to serialize *AtlasEntityWithExtInfo*.
> *Import API*
> - _ZipSource_ Modified to source *AtlasEntityWithExtInfo*.
> - _EntityImportStream_ Modified to source *AtlasEntityWithExtInfo*.
> - _AtlasEntityStreamForImport.getGuid_ Modified to source requested entities
> first from stored *AtlasEntityWithExtInfo* object. Request from stream only
> if not found.
> - _AtlasEntityStoreV1.bulkImport_ Minor modification to use the new changes
> to stream.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)