[ 
https://issues.apache.org/jira/browse/ATLAS-616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hemanth Yamijala updated ATLAS-616:
-----------------------------------
    Attachment: no-dsl-1000-14400-10g-heap.png

I have isolated the problem to DSL queries and the underlying Gremlin Groovy 
script engine we use. There are multiple things proving this:

* Removing DSL queries from the test plan results in stable GC logs and tenured 
heap usage. The attached image (no-dsl-1000-14400-10g-heap.png) shows how the 
tenured heap keeps shrinking back to a stable level with every GC.
* Profiling with Yourkit shows the majority of the largest objects are Maps and 
Map entries that all are weak referenced by the Script engine and related 
objects.
* In {{GremlinEvaluator.scala}}, I made the ScriptManager and Engine static 
objects and that helped improve the situation, though did not completely fix 
the problem.

Anyone knows of issues with GremlinGroovyScriptEngine or how it should be used 
correctly?

> Zookeeper throws exceptions when trying to fire DSL queries at Atlas at large 
> scale. 
> -------------------------------------------------------------------------------------
>
>                 Key: ATLAS-616
>                 URL: https://issues.apache.org/jira/browse/ATLAS-616
>             Project: Atlas
>          Issue Type: Bug
>         Environment: Atlas with External kafka / HBase / Solr
> The test is run on cluster setup.
> Machine 1 - Atlas , Solr
> Machine 2 - Kafka , HBase
> Machine 3 - Hive , client
>            Reporter: Sharmadha Sainath
>            Assignee: Hemanth Yamijala
>         Attachments: baseline-1000-3600-10g-heap.png, 
> no-dsl-1000-14400-10g-heap.png, zk-exception-stacktrace.rtf
>
>
> The test plan is to simulate 'n' number of users fire 'm' number of queries 
> at Atlas simultaneously. This is accomplished with the help of Apache Jmeter.
> Atlas is populated with 10,000 tables. 
> • 6000 small sized tables (10 columns)
> • 3000 medium sized tables (50 columns)
> • 1000 large sized tables (100 columns)
>  The test plan consists of 30 users firing a set of 3 queries continuously 
> for 20 times in a loop. Added -Xmx10240m -XX:MaxPermSize=512m to ATLAS_OPTS . 
> Zookeeper throws exceptions when the test plan is run and Jmeter starts 
> firing queries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to