Re: Zeppelin with external cluster

Jongyoul Lee Thu, 29 Jan 2015 05:58:49 -0800

Hi Kevin,

I've mistaken. spark 1.2 needs snappyjava for compression because snappy is
a default option for compressing on spark 1.2. That's all.


In my solution for avoiding this is
JAVA_OPTS='-Dspark.io.compression.codec=lzf" ./zeppelin-daemon.sh start.
And I've found we don't have to set SPARK_CLASSPATH. SPARK_CLASSPATH is as
same as ZEPPELIN_CLASSPATH and this is a same value of CLASSPATH.

I have a new question. Is there any way to set "spark.*"? In general,
spark-submit use spark-defaults.conf. But Beacuse Zeppelin doesn't use
spark-submit, we cannot set spark.* value except setting java system
properties.

Regards,
JL

On Thu, Jan 29, 2015 at 5:18 PM, Kevin Kim (Sangwoo) <[email protected]>
wrote:

> Cool, please send pull request, I'll look into it!
>
> On Thu Jan 29 2015 at 2:41:41 PM Jongyoul Lee <[email protected]> wrote:
>
> > Hi Kein,
> >
> > ADD_JARS is a good way to solve it. But, snappyjava depends on zeppelin.
> > Zeppelin should add their jars into appropriate way. My PR is about
> adding
> > jars and might be very small change.
> >
> > Regars,
> > JL
> >
> > On Thu, Jan 29, 2015 at 2:34 PM, Kevin (Sangwoo) Kim <
> [email protected]>
> > wrote:
> >
> > > Well, for me,
> > > When I need to supply external libraries,
> > > I'm using
> > > export ADD_JARS="~~~.jar"
> > > export ZEPPELIN_CLASSPATH="~~~.jar"
> > > in zeppelin-env.sh
> > >
> > > and using ADD_JARS="~~~.jar"
> > > in spark-env.sh for spark clusters. (the library jar is deployed across
> > all
> > > clusters)
> > >
> > > I want to note that the config I'm using is quite old and deprecated.
> > > So I'm testing #308 for replace this.
> > >
> > > Of course a contribution is always welcomed, It would be cool supplying
> > it
> > > via PR if the code is simple, or the code is large, it would be good to
> > > discuss it before writing codes.
> > >
> > > Regards,
> > > Kevin
> > >
> > >
> > > On Thu Jan 29 2015 at 2:21:34 PM Jongyoul Lee <[email protected]>
> > wrote:
> > >
> > > > I'll resend email 'cause my attachment's size if larger than 1000000
> > > bytes
> > > >
> > > >
> > > > ---------- Forwarded message ----------
> > > > From: Jongyoul Lee <[email protected]>
> > > > Date: Thu, Jan 29, 2015 at 2:14 PM
> > > > Subject: Re: Zeppelin with external cluster
> > > > To: [email protected]
> > > >
> > > >
> > > > Hi Kevin,
> > > >
> > > > I also change master to spark://dicc-m002:7077. Actually, I think
> > > > interpreter.json affect what cluster is used on running codes.
> Anyway,
> > my
> > > > interpreter screenshot is below, and my error is like this.
> > > >
> > > > org.apache.spark.SparkException: Job aborted due to stage failure:
> > Task 1
> > > > in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in
> > stage
> > > > 0.0 (TID 6, DICc-r1n029): java.lang.UnsatisfiedLinkError: no
> > snappyjava
> > > > in java.library.path at java.lang.ClassLoader.
> > > > loadLibrary(ClassLoader.java:1886) at
> > > java.lang.Runtime.loadLibrary0(Runtime.java:849)
> > > > at java.lang.System.loadLibrary(System.java:1088) at
> > org.xerial.snappy.
> > > > SnappyLoader.loadNativeLibrary(SnappyLoader.java:170) at
> > > > org.xerial.snappy.SnappyLoader.load(SnappyLoader.java:145) at
> > > > org.xerial.snappy.Snappy.<clinit>(Snappy.java:47) at
> > org.xerial.snappy.
> > > > SnappyInputStream.hasNextChunk(SnappyInputStream.java:358) at
> > > > org.xerial.snappy.SnappyInputStream.rawRead(
> > SnappyInputStream.java:167)
> > > > at org.xerial.snappy.SnappyInputStream.read(
> > SnappyInputStream.java:150)
> > > > at
> > > java.io.ObjectInputStream$PeekInputStream.read(
> > ObjectInputStream.java:2310)
> > > > at
> > > java.io.ObjectInputStream$PeekInputStream.readFully(
> > ObjectInputStream.java:2323)
> > > > at java.io.ObjectInputStream$BlockDataInputStream.
> > > > readShort(ObjectInputStream.java:2794) at java.io.ObjectInputStream.
> > > > readStreamHeader(ObjectInputStream.java:801) at
> > > > java.io.ObjectInputStream.<init>(ObjectInputStream.java:299) at
> > > > org.apache.spark.serializer.JavaDeserializationStream$$
> > > > anon$1.<init>(JavaSerializer.scala:57) at
> org.apache.spark.serializer.
> > > > JavaDeserializationStream.<init>(JavaSerializer.scala:57) at
> > > >
> > > org.apache.spark.serializer.JavaSerializerInstance.deserializeStream(
> > JavaSerializer.scala:95)
> > > > at
> > > org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(
> > TorrentBroadcast.scala:215)
> > > > at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$
> > > > readBroadcastBlock$1.apply(TorrentBroadcast.scala:177) at
> > > > org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1000) at
> > > >
> > > org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(
> > TorrentBroadcast.scala:164)
> > > > at org.apache.spark.broadcast.TorrentBroadcast._value$
> > > > lzycompute(TorrentBroadcast.scala:64) at org.apache.spark.broadcast.
> > > > TorrentBroadcast._value(TorrentBroadcast.scala:64) at
> > > >
> > > org.apache.spark.broadcast.TorrentBroadcast.getValue(
> > TorrentBroadcast.scala:87)
> > > > at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70) at
> > > > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:58) at
> > > > org.apache.spark.scheduler.Task.run(Task.scala:56) at
> > > > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
> > at
> > > >
> > > java.util.concurrent.ThreadPoolExecutor.runWorker(
> > ThreadPoolExecutor.java:1145)
> > > > at
> > > java.util.concurrent.ThreadPoolExecutor$Worker.run(
> > ThreadPoolExecutor.java:615)
> > > > at java.lang.Thread.run(Thread.java:744) Driver stacktrace: at
> > > > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$
> > > >
> > > scheduler$DAGScheduler$$failJobAndIndependentStages(
> > DAGScheduler.scala:1214)
> > > > at
> > > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(
> > DAGScheduler.scala:1203)
> > > > at
> > > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(
> > DAGScheduler.scala:1202)
> > > > at
> > > scala.collection.mutable.ResizableArray$class.foreach(
> > ResizableArray.scala:59)
> > > > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> > at
> > > >
> > > org.apache.spark.scheduler.DAGScheduler.abortStage(
> > DAGScheduler.scala:1202)
> > > > at org.apache.spark.scheduler.DAGScheduler$$anonfun$
> > > > handleTaskSetFailed$1.apply(DAGScheduler.scala:696) at
> > > > org.apache.spark.scheduler.DAGScheduler$$anonfun$
> > > > handleTaskSetFailed$1.apply(DAGScheduler.scala:696) at
> > > > scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.
> > > > DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:696) at
> > > > org.apache.spark.scheduler.DAGSchedulerEventProcessActor$
> > > > $anonfun$receive$2.applyOrElse(DAGScheduler.scala:1420) at
> > > > akka.actor.Actor$class.aroundReceive(Actor.scala:465) at
> > > > org.apache.spark.scheduler.DAGSchedulerEventProcessActor.
> > > > aroundReceive(DAGScheduler.scala:1375) at akka.actor.ActorCell.
> > > > receiveMessage(ActorCell.scala:516) at
> > > akka.actor.ActorCell.invoke(ActorCell.scala:487)
> > > > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238) at
> > > > akka.dispatch.Mailbox.run(Mailbox.scala:220) at akka.dispatch.
> > > >
> > > ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(
> > AbstractDispatcher.scala:393)
> > > > at scala.concurrent.forkjoin.ForkJoinTask.doExec(
> > ForkJoinTask.java:260)
> > > > at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.
> > > > runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.
> > > > ForkJoinPool.runWorker(ForkJoinPool.java:1979) at
> > > > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(
> > > > ForkJoinWorkerThread.java:107)
> > > >
> > > > I think that this error is about class path. I'm running zeppelin
> under
> > > > /home/1001079/apache-zeppelin. Which means all classes are located
> > under
> > > > this directory. Because zeppelin adds classes to SPARK_CLASSPATH, if
> > > slave
> > > > doesn't have that libraries on the same path, It might be no class
> > error
> > > > occurs.
> > > >
> > > > I want to contribute by fixing this issue. Could you please tell me
> > > > regular steps for dealing with an issue? Or Is it ok to make a PR
> > without
> > > > JIRA issue?
> > > >
> > > > Regards,
> > > > JL
> > > >
> > > > On Thu, Jan 29, 2015 at 1:55 PM, Kevin (Sangwoo) Kim <
> > > [email protected]>
> > > > wrote:
> > > >
> > > >> Hi Jongyoul,
> > > >> I'm using Zeppelin with external cluster.
> > > >> (standalone mode)
> > > >>
> > > >> All I needed to do is, writing master setting like
> > > >> export MASTER="spark://IP-ADDRESS:7077"
> > > >> in $ZEPPELIN/conf/zeppelin-env.sh
> > > >>
> > > >> If your error persists, plz post the error message in reply!
> > > >> I'm gonna looking at it.
> > > >>
> > > >> Regards,
> > > >> Kevin
> > > >>
> > > >>
> > > >> On Thu Jan 29 2015 at 12:58:41 PM Jongyoul Lee <[email protected]>
> > > >> wrote:
> > > >>
> > > >> > Hi dev,
> > > >> >
> > > >> > I've succeeded zeppelin with spark 1.2. Thanks, Moon. Now, I'm
> > trying
> > > to
> > > >> > use zeppelin with external cluster. I've tested yesterday with
> > > >> standalone,
> > > >> > mesos, but the results are not good. In case of standalone, No
> > > >> snappyjava
> > > >> > error occurs, and in case of mesos, Nothing's happened. Do you
> have
> > > any
> > > >> > reference to run zeppelin with external cluster? If you don't have
> > > >> anyone,
> > > >> > I can write references for running with external cluster.
> > > >> >
> > > >> > Regards,
> > > >> > JL
> > > >> >
> > > >> > --
> > > >> > 이종열, Jongyoul Lee, 李宗烈
> > > >> > http://madeng.net
> > > >> >
> > > >>
> > > >
> > > >
> > > >
> > > > --
> > > > 이종열, Jongyoul Lee, 李宗烈
> > > > http://madeng.net
> > > >
> > > >
> > > >
> > > > --
> > > > 이종열, Jongyoul Lee, 李宗烈
> > > > http://madeng.net
> > > >
> > >
> >
> >
> >
> > --
> > 이종열, Jongyoul Lee, 李宗烈
> > http://madeng.net
> >
>



-- 
이종열, Jongyoul Lee, 李宗烈
http://madeng.net

Re: Zeppelin with external cluster

Reply via email to