[
https://issues.apache.org/jira/browse/SPARK-7819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572205#comment-14572205
]
Fi commented on SPARK-7819:
---------------------------
Hello, sorry for not responding sooner, been quite hectic at work.
We have a smoke test that I run whenever I'm testing a new Spark custom build.
Basically it's a python script that test various parts of the Spark API.
During the course of the execution, several Spark Contexts are created, as is
HiveContext and SQLContext wrappers.
The test is rather light, but it does a decent job of giving me a heads up when
an API changes underneath me so I can give our developers fair warning. :)
It does things like reading/writing parquet files, reading/writing files to
MARPFS, word count jobs, hive queries, DataFrame API calls, etc.
It also serves as a light benchmark suite, so that I can keep an eye on
performance that may have been introduced by the spark distribution, or by
regular operational shenanigans on our Mesos cluster.
The test takes a simple 4-node dev/integration cluster about 200 seconds to
run, moving around 100 GB of data from a non-local MAPRFS cluster via raw
textFile and HiveContext/SQLContext queries.
Anyway, per my last comment, we ran out of PermGen in this script.
I created an even newer Spark 1.4 build, git
84da653192a2d9edb82d0dbe50f577c4dc6a0c78 and deployed it to our test cluster.
I then updated the spark-defaults.conf per your suggestions, as well as
increasing the JVM PermGen settings:
spark.sql.hive.metastore.sharedPrefixes
com.mysql.jdbc,org.postgresql,com.microsoft.sqlserver,oracle.jdbc,com.mapr.fs.shim.LibraryLoader,com.mapr.security.JNISecurity,com.mapr.fs.jni
spark.driver.extraJavaOptions -XX:+CMSClassUnloadingEnabled
-XX:+CMSPermGenSweepingEnabled -XX:MaxPermSize=512M
I'm not sure if CMSClassUnloadingEnabled and CMSPermGenSweepingEnabled is
needed. I came across these settings on StackOverflow, and it sounded like it
wouldn't hurt, considering what the Isolated Hive Client Loader might be trying
to do.
Incidentally, I typically run this smoke test script as an Ipython Notebook,
this lets me also do smoke tests on non-spark related apis (such as using
matplotlib).
With the above settings, I was able to get through the smoke test without
errors.
Just for kicks, I ran it a second time (WITHIN the same running kernel), hoping
(or not) to see a OOM.
It worked! So a third time, and it still worked.
I kicked it off a fourth time (still within the same ipython kernel) and was
about to declare this a success, when the script failed with an
InvalidClassCastException (attached).
Very strange! Not sure what could cause it.
Anyway, I tried a fifth time (still within the same kernel), and it passed just
fine.
Considering the smoke tests worked fine 4 out of 5 times, I'm satisfied enough,
and will chalk this up as some flakiness in the JVM and all the funky class
loading. Also, did I mention that this ipython Notebook is also running in a
docker container on a XEN Hypervisor VM ? Maybe that had something to do with
it. :)
So it would appear that increasing the PermGen space should be highly
recommended (and maybe a default stock setting) in order to avoid the PermGen
OOM error.
> Isolated Hive Client Loader appears to cause Native Library
> libMapRClient.4.0.2-mapr.so already loaded in another classloader error
> -----------------------------------------------------------------------------------------------------------------------------------
>
> Key: SPARK-7819
> URL: https://issues.apache.org/jira/browse/SPARK-7819
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 1.4.0
> Reporter: Fi
> Priority: Critical
> Attachments: invalidClassException.log, stacktrace.txt, test.py
>
>
> In reference to the pull request: https://github.com/apache/spark/pull/5876
> I have been running the Spark 1.3 branch for some time with no major hiccups,
> and recently switched to the Spark 1.4 branch.
> I build my spark distribution with the following build command:
> {noformat}
> make-distribution.sh --tgz --skip-java-test --with-tachyon -Phive
> -Phive-0.13.1 -Pmapr4 -Pspark-ganglia-lgpl -Pkinesis-asl -Phive-thriftserver
> {noformat}
> When running a python script containing a series of smoke tests I use to
> validate the build, I encountered an error under the following conditions:
> * start a spark context
> * start a hive context
> * run any hive query
> * stop the spark context
> * start a second spark context
> * run any hive query
> ** ERROR
> From what I can tell, the Isolated Class Loader is hitting a MapR class that
> is loading its native library (presumedly as part of a static initializer).
> Unfortunately, the JVM prohibits this the second time around.
> I would think that shutting down the SparkContext would clear out any
> vestigials of the JVM, so I'm surprised that this would even be a problem.
> Note: all other smoke tests we are running passes fine.
> I will attach the stacktrace and a python script reproducing the issue (at
> least for my environment and build).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]