[
https://issues.apache.org/jira/browse/ATLAS-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16513574#comment-16513574
]
Madhan Neethiraj edited comment on ATLAS-2708 at 6/15/18 9:14 AM:
------------------------------------------------------------------
[~barbara] - thanks for AWS/S3 model types. Here are my comments:
- looks like following types are suitable to be modeled as a struct, instead of
entity - given each instance will be contained within an instance of
AWSS3Bucket; and they don't need their own separate identity outside of its
container.
-- AWSTag
-- AWSCloudWatchMetric
-- AWSS3BucketLifeCycleRule
- is avroSchema applicable for AWSS3PseudoDir?
- consider renaming attribute AWSTag.tag to AWSTag.key - to be in sync with
names used in
https://docs.aws.amazon.com/AmazonS3/latest/dev/object-tagging.html
- I would suggest using attribute names that begin with a lower case letter, to
be consistent with rest of types. AWSS3Bucket.S3AccessPolicy,
AWSS3Bucket.AWSTags
- AWSS3Object has an array of avro_schema associated with. Wouldn't a single
avro_schema be enough?
I updated the model files for above comments, except the last one, and uploaded
in this JIRA - [^3010-aws_model.json]. Please review.
was (Author: madhan.neethiraj):
[~barbara] - thanks for AWS/S3 model types. Here are my comments:
- looks like following types are suitable to be modeled as a struct, instead of
entity - given each instance will be contained within an instance of
AWSS3Bucket; and they don't need their own separate identity outside of its
container.
-- AWSTag
-- AWSCloudWatchMetric
-- AWSS3BucketLifeCycleRule
- is avroSchema applicable for AWSS3PseudoDir?
- consider renaming attribute AWSTag.tag to AWSTag.key - to be in sync with
names used in
https://docs.aws.amazon.com/AmazonS3/latest/dev/object-tagging.html
- I would suggest using attribute names that begin with a lower case letter, to
be consistent with rest of types. AWSS3Bucket.S3AccessPolicy,
AWSS3Bucket.AWSTags
- AWSS3Object has an array of avro_schema associated with. Wouldn't a single
avro_schema be enough?
I updated the model files for above comments, except the last one, and uploaded
in this JIRA - 3010-aws_model.json. Please review.
> AWS S3 data lake typedefs for Atlas
> -----------------------------------
>
> Key: ATLAS-2708
> URL: https://issues.apache.org/jira/browse/ATLAS-2708
> Project: Atlas
> Issue Type: New Feature
> Components: atlas-core
> Reporter: Barbara Eckman
> Assignee: Barbara Eckman
> Priority: Critical
> Attachments: 3010-aws_model.json, all_AWS_common_typedefs.json,
> all_datalake_typedefs.json
>
>
> Currently the base types in Atlas do not include AWS data lake objects. It
> would be nice to add typedefs for AWS data lake objects (buckets and
> pseudo-directories) and lineage processes that move the data from another
> source (e.g., kafka topic) to the data lake. For example:
> * AWSS3PseudoDir type represents the pseudo-directory “prefix” of objects in
> an S3 bucket. For example, in the case of an object with key
> “myWork/Development/Projects1.xls”, “myWork/Development” is the
> pseudo-directory. It supports:
> ** Array of avro schemas that are associated with the data in the
> pseudo-directory (based on Avro schema extensions outlined in ATLAS-2694)
> ** what type of data it contains, e.g., avro, json, unstructured
> ** time of creation
> * AWSS3BucketLifeCycleRule type represents a rule specifying a transition of
> the data in a bucket to a storageClass after a specific time interval, or
> expiration. For example, transition to GLACIER after 60 days, or expire
> (i.e. be deleted) after 90 days:
> ** ruleType (e.g., transition or expiration)
> ** time interval in days before rule is executed
> ** storageClass to which the data is transitioned (null if ruleType is
> expiration)
> * AWSTag type represents a tag-value pair created by the user and associated
> with an AWS object.
> ** tag
> ** value
> * AWSCloudWatchMetric type represents a storage or request metric that is
> monitored by AWS CloudWatch and can be configured for a bucket
> ** metricName, for example, “AllRequests”, “GetRequests”,
> TotalRequestLatency, BucketSizeBytes
> ** scope: null if entire bucket; otherwise, the prefixes/tags that filter or
> limit the monitoring of the metric.
> * AWSS3Bucket type represents a bucket in an S3 instance. It supports:
> ** Array of AWSS3PseudoDirectories that are associated with objects stored
> in the bucket
> ** AWS region
> ** IsEncrypted (boolean)
> ** encryptionType, e.g., AES-256
> ** S3AccessPolicy, a JSON object expressing access policies, eg GetObject,
> PutObject
> ** time of creation
> ** Array of AWSS3BucketLifeCycleRules that are associated with the bucket
> ** Array of AWSS3CloudWatchMetrics that are associated with the bucket or
> its tags or prefixes
> ** Array of AWSTags that are associated with the bucket
> * Generic dataset2Dataset process to represent movement of data from one
> dataset to another. It supports:
> ** array of transforms performed by the process
> ** map of tag/value pairs representing configurationParameters of the process
> ** inputs and outputs are arrays of dataset objects, e.g., kafka topic and
> S3 pseudo-directory.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)