Tim Gautier created ZEPPELIN-3727:
-------------------------------------
Summary: Spark commands execute correctly, but log extreme number
of errors
Key: ZEPPELIN-3727
URL: https://issues.apache.org/jira/browse/ZEPPELIN-3727
Project: Zeppelin
Issue Type: Bug
Components: Interpreters
Affects Versions: 0.7.3
Reporter: Tim Gautier
I'm running EMR 5.16.0 on AWS. If I try to run any Spark SQL queries against my
RDBMS using the Scala interpreter, they seem to execute just fine, however the
log file fills with this exception over and over again:
{noformat}
ERROR [2018-08-16 22:04:36,601] ({pool-2-thread-2}
SparkInterpreter.java[getProgressFromStage_1_1x]:1503) - Error on getting
progress information
java.lang.NoSuchMethodException:
org.apache.zeppelin.spark.SparkInterpreter$1.stageIdToData()
at java.lang.Class.getMethod(Class.java:1786)
at
org.apache.zeppelin.spark.SparkInterpreter.getProgressFromStage_1_1x(SparkInterpreter.java:1487)
at
org.apache.zeppelin.spark.SparkInterpreter.getProgressFromStage_1_1x(SparkInterpreter.java:1510)
at
org.apache.zeppelin.spark.SparkInterpreter.getProgress(SparkInterpreter.java:1430)
at
org.apache.zeppelin.interpreter.LazyOpenInterpreter.getProgress(LazyOpenInterpreter.java:117)
at
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer.getProgress(RemoteInterpreterServer.java:555)
at
org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Processor$getProgress.getResult(RemoteInterpreterService.java:1762)
at
org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Processor$getProgress.getResult(RemoteInterpreterService.java:1747)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
{noformat}
This simple code will trigger it (hitting my own database), though I'm not
convinced it has anything to do with Spark SQL, but instead with long running
commands.
{code:java}
import org.apache.spark.sql._
val dbConnectionMap = Map(
"url" -> "<redacted>",
"driver" -> "com.mysql.jdbc.Driver"
)
val sql = """(select item_name from product_catalog) as product_catalog"""
val products = spark.read.format("jdbc").options(dbConnectionMap + ("dbtable"
-> sql)).load.cache
products.count
{code}
This wouldn't be a big concern since the execution works, except that after a
couple hours of analyzing data, I started getting file system errors. It turned
out to be caused by the log file taking up all the hard drive space, 33GB!
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)