[ https://issues.apache.org/jira/browse/ATLAS-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16998538#comment-16998538 ]
Bolke de Bruin edited comment on ATLAS-3254 at 12/17/19 8:08 PM: ----------------------------------------------------------------- What [~mayank_nj] do you consider to load “properly”? What is the time taken to show the properties? What is the size of the json sent over the network (ours is > 27mb)? What is the load time? What is the render time? Are you saying the loading of 200K objects in a pseudodir is taking over 1h? That is not “proper” I think? was (Author: bolke): What [~mayank_nj] do you consider to load “properly”? What is the time taken to show the properties? What is the size of the json sent over the network (ours is > 27mb)? What is the load time? What is the render time? > Atlas entity with large array of refs causes performance issues for lineage > --------------------------------------------------------------------------- > > Key: ATLAS-3254 > URL: https://issues.apache.org/jira/browse/ATLAS-3254 > Project: Atlas > Issue Type: Bug > Components: atlas-core, atlas-webui > Affects Versions: 1.0.0, 2.0.0 > Reporter: Adam Rempter > Assignee: Mayank Jain > Priority: Major > Labels: performance > Attachments: Screenshot 2019-11-28 at 21.18.44.png, > entity_auto_create.sh, example_create_entities.json, > rest_entity_get_pseudodir.json > > > We use “aws_s3_pseudo_dir” type from 3020-aws_s3_typedefs.json model. > It has following property: > "name": "s3Objects", > "typeName": "array<aws_s3_object>" > > Now in AWS buckets you can have thousands of objects. This causes that > s3Objects array grows quite quickly, causing aws_s3_pseudo_dir entity Json to > rich easly few MBs. > > Then we start seeing problems like: > * UI is dying on displaying entity properties or lineage > * Error in logs: audit record too long: entityType=aws_s3_pseudo_dir, > guid=24398271-6ba0-4db5-adfa-38e432dc55ce, size=1053931; maxSize=1048576. > entity attribute values not stored in audit (EntityAuditListenerV2:234) > * Some errors with write to HBase (java.lang.IllegalArgumentException: > KeyValue size too large, as workaround we set hbase.client.keyvalue.maxsize > param to 0) > * kafka consumer errors (we can of course set some parameters on consumer, > but I think it is just workaround) > … > Exception in NotificationHookConsumer (NotificationHookConsumer:332) > org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be > completed since the group has already rebalanced and assigned the partitions > to another member. This means that the time between subsequen > t calls to poll() was longer than the configured max.poll.interval.ms, which > typically implies that the poll loop is spending too much time message > processing. You can address this either by increasing the sessio > n timeout or by reducing the maximum size of batches returned in poll() with > max.poll.records. > … > Specifying pseudo_dir is required for s3objects: > name": "pseudoDirectory", > "typeName": "aws_s3_pseudo_dir", > "cardinality": "SINGLE", > "isIndexable": false, > *"isOptional": false,* > "isUnique": false, > -- This message was sent by Atlassian Jira (v8.3.4#803005)