That first URL is not the file. It's a web page with links to the file
in different mirrors. I just looked at the actual file in one of the
mirrors and it looks fine.
On Mon, Dec 30, 2019 at 1:34 PM rsinghania wrote:
>
> Hi,
>
> I'm trying to open the file
>
BTW the SparkLauncher API has hooks to capture the stderr of the
spark-submit process into the logging system of the parent process.
Check the API javadocs since it's been forever since I looked at that.
On Wed, Apr 24, 2019 at 1:58 PM Marcelo Vanzin wrote:
>
> S
Setting the SPARK_PRINT_LAUNCH_COMMAND env variable to 1 in the
launcher env will make Spark code print the command to stderr. Not
optimal but I think it's the only current option.
On Wed, Apr 24, 2019 at 1:55 PM Jeff Evans
wrote:
>
> The org.apache.spark.launcher.SparkLauncher is used to
If you're not using spark-submit, then that option does nothing.
If by "context creation API" you mean "new SparkContext()" or an
equivalent, then you're explicitly creating the driver inside your
application.
On Tue, Mar 26, 2019 at 1:56 PM Pat Ferrel wrote:
>
> I have a server that starts a
I don't think "spark.authenticate" works properly with k8s in 2.4
(which would make it impossible to enable encryption since it requires
authentication). I'm pretty sure I fixed it in master, though.
On Tue, Mar 26, 2019 at 2:29 AM Sinha, Breeta (Nokia - IN/Bangalore)
wrote:
>
> Hi All,
>
>
>
>
It doesn't work (except if you're extremely lucky), it will eat your
lunch and will also kick your dog.
And it's not even going to be an option in the next version of Spark.
On Wed, Mar 13, 2019 at 11:38 PM Ido Friedman wrote:
>
> Hi,
>
> I am researching the use of multiple sparkcontext in one
Hi,
On Tue, Jan 22, 2019 at 11:30 AM Pola Yao wrote:
> "Thread-1" #19 prio=5 os_prio=0 tid=0x7f9b6828e800 nid=0x77cb waiting on
> condition [0x7f9a123e3000]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for
.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197)
> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>
> "VM Thread" os_p
s to
> exit the spark (e.g., System.exit()), but failed. Is there an explicit way to
> shutdown all the alive threads in the spark application and then quit
> afterwards?
>
>
> On Tue, Jan 15, 2019 at 2:38 PM Marcelo Vanzin wrote:
>>
>> You should check the active thread
You should check the active threads in your app. Since your pool uses
non-daemon threads, that will prevent the app from exiting.
spark.stop() should have stopped the Spark jobs in other threads, at
least. But if something is blocking one of those threads, or if
something is creating a non-daemon
h “kms-dt”.
>
>
>
> Anyone knows why this is happening ? Any suggestion to make it working
> with KMS ?
>
>
>
> Thanks
>
>
>
>
>
>
>
> [image: cid:image001.jpg@01D41D15.E01B6F00]
>
> *Paolo Platter*
>
> *CTO*
>
> E-mail:paolo.plat...@ag
If you are using the principal / keytab params, Spark should create
tokens as needed. If it's not, something else is going wrong, and only
looking at full logs for the app would help.
On Wed, Jan 2, 2019 at 5:09 PM Ali Nazemian wrote:
>
> Hi,
>
> We are using a headless keytab to run our
First, it's really weird to use "org.apache.spark" for a class that is
not in Spark.
For executors, the jar file of the sink needs to be in the system
classpath; the application jar is not in the system classpath, so that
does not work. There are different ways for you to get it there, most
of
+user@
>> -- Forwarded message -
>> From: Wenchen Fan
>> Date: Thu, Nov 8, 2018 at 10:55 PM
>> Subject: [ANNOUNCE] Announcing Apache Spark 2.4.0
>> To: Spark dev list
>>
>>
>> Hi all,
>>
>> Apache Spark 2.4.0 is the fifth release in the 2.x line. This release adds
>> Barrier
production application continues to submit jobs every once in a while,
> the issue persists.
>
> On Wed, Oct 24, 2018 at 5:05 PM Marcelo Vanzin wrote:
>>
>> When you say many jobs at once, what ballpark are you talking about?
>>
>> The code in 2.3+ does try to ke
cala.concurrent._
> scala> import scala.concurrent.ExecutionContext.Implicits.global
> scala> for (i <- 0 until 5) { Future { println(sc.parallelize(0 until
> i).collect.length) } }
>
> On Mon, Oct 22, 2018 at 11:25 AM Marcelo Vanzin wrote:
>>
>> Just tried on 2.3.2 and worked fine f
Just tried on 2.3.2 and worked fine for me. UI had a single job and a
single stage (+ the tasks related to that single stage), same thing in
memory (checked with jvisualvm).
On Sat, Oct 20, 2018 at 6:45 PM Marcelo Vanzin wrote:
>
> On Tue, Oct 16, 2018 at 9:34 AM Patrick Brown
> wro
On Tue, Oct 16, 2018 at 9:34 AM Patrick Brown
wrote:
> I recently upgraded to spark 2.3.1 I have had these same settings in my spark
> submit script, which worked on 2.0.2, and according to the documentation
> appear to not have changed:
>
> spark.ui.retainedTasks=1
> spark.ui.retainedStages=1
Spark only does Kerberos authentication on the driver. For executors it
currently only supports Hadoop's delegation tokens for Kerberos.
To use something that does not support delegation tokens you have to
manually manage the Kerberos login in your code that runs in executors,
which might be
k/blob/88e7e87bd5c052e10f52d4bb97a9d78f5b524128/core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala#L31
>> >
>> > The code shows Spark will try to find the path if SPARK_HOME is specified.
>> > And on my worker node, SPARK_HOME is specified in .bashrc , for the
for the
> pre-installed 2.2.1 path.
>
> I don't want to make any changes to worker node configuration, so any way to
> override the order?
>
> Jianshi
>
> On Fri, Oct 5, 2018 at 12:11 AM Marcelo Vanzin wrote:
>>
>> Normally the version of Spark installed on the
Normally the version of Spark installed on the cluster does not
matter, since Spark is uploaded from your gateway machine to YARN by
default.
You probably have some configuration (in spark-defaults.conf) that
tells YARN to use a cached copy. Get rid of that configuration, and
you can use whatever
See SPARK-4160. Long story short: you need to upload the files and
jars to some shared storage (like HDFS) manually.
On Wed, Sep 5, 2018 at 2:17 AM Guillermo Ortiz Fernández
wrote:
>
> I'm using standalone cluster and the final command I'm trying is:
> spark-submit --verbose --deploy-mode cluster
I'm not familiar with PyCharm. But if you can run "pyspark" from the
command line and not hit this, then this might be an issue with
PyCharm or your environment - e.g. having an old version of the
pyspark code around, or maybe PyCharm itself might need to be updated.
On Thu, Jun 14, 2018 at 10:01
I only know of a way to do that with YARN.
You can distribute the jar files using "--files" and add just their
names (not the full path) to the "extraClassPath" configs. You don't
need "userClassPathFirst" in that case.
On Thu, Jun 14, 2018 at 1:28 PM, Arjun kr wrote:
> Hi All,
>
>
> I am
We are happy to announce the availability of Spark 2.3.1!
Apache Spark 2.3.1 is a maintenance release, based on the branch-2.3
maintenance branch of Spark. We strongly recommend all 2.3.x users to
upgrade to this stable release.
To download Spark 2.3.1, head over to the download page:
That feature has not been implemented yet.
https://issues.apache.org/jira/browse/SPARK-11033
On Wed, Jun 6, 2018 at 5:18 AM, Behroz Sikander wrote:
> I have a client application which launches multiple jobs in Spark Cluster
> using SparkLauncher. I am using Standalone cluster mode. Launching
I already gave my recommendation in my very first reply to this thread...
On Fri, May 25, 2018 at 10:23 AM, raksja wrote:
> ok, when to use what?
> do you have any recommendation?
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
>
On Fri, May 25, 2018 at 10:18 AM, raksja wrote:
> InProcessLauncher would just start a subprocess as you mentioned earlier.
No. As the name says, it runs things in the same process.
--
Marcelo
-
To
That's what Spark uses.
On Fri, May 25, 2018 at 10:09 AM, raksja wrote:
> thanks for the reply.
>
> Have you tried submit a spark job directly to Yarn using YarnClient.
> https://hadoop.apache.org/docs/r2.6.0/api/org/apache/hadoop/yarn/client/api/YarnClient.html
>
> Not
On Wed, May 23, 2018 at 12:04 PM, raksja wrote:
> So InProcessLauncher wouldnt use the native memory, so will it overload the
> mem of parent process?
I will still use "native memory" (since the parent process will still
use memory), just less of it. But yes, it will use
On Tue, May 22, 2018 at 12:45 AM, Makoto Hashimoto
wrote:
> local:///usr/local/oss/spark-2.3.0-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.3.0.jar
Is that the path of the jar inside your docker image? The default
image puts that in /opt/spark IIRC.
--
Marcelo
You can either:
- set spark.yarn.submit.waitAppCompletion=false, which will make
spark-submit go away once the app starts in cluster mode.
- use the (new in 2.3) InProcessLauncher class + some custom Java code
to submit all the apps from the same "launcher" process.
On Wed, May 16, 2018 at 1:45
KVStore library of spark). Is there a way to fetch data from this
> KVStore (which uses levelDb for storage) and filter it on basis on
> timestamp?
>
> Thanks,
> Anshi
>
> On Mon, May 7, 2018 at 9:51 PM, Marcelo Vanzin [via Apache Spark User List]
> <ml+s1001560n32114...@n3.n
Using a custom Guava version with Spark is not that simple. Spark
shades Guava, but a lot of libraries Spark uses do not - the main one
being all of the Hadoop ones, and they need a quite old Guava.
So you have two options: shade/relocate Guava in your application, or
use
On Mon, May 7, 2018 at 1:44 AM, Anshi Shrivastava
wrote:
> I've found a KVStore wrapper which stores all the metrics in a LevelDb
> store. This KVStore wrapper is available as a spark-dependency but we cannot
> access the metrics directly from spark since they are
plication is running on
> k8 but listener is not getting invoked
>
>
> On Monday, April 30, 2018, Marcelo Vanzin <van...@cloudera.com> wrote:
>>
>> I'm pretty sure this feature hasn't been implemented for the k8s backend.
>>
>> On Mon, Apr 30, 2018 at 4:51 PM, pur
There are two things you're doing wrong here:
On Thu, Apr 12, 2018 at 6:32 PM, jb44 wrote:
> Then I can add the alluxio client library like so:
> sparkSession.conf.set("spark.driver.extraClassPath", ALLUXIO_SPARK_CLIENT)
First one, you can't modify JVM configuration after it
This is the problem:
> :/opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar;/opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar
Seems like some code is confusing things when mixing OSes. It's using
the Windows separator when building a command line ti be run on a
Linux host.
On Tue, Apr
Why: it's part historical, part "how else would you do it".
SparkConf needs to read properties read from the command line, but
SparkConf is something that user code instantiates, so we can't easily
make it read data from arbitrary locations. You could use thread
locals and other tricks, but user
On Mon, Mar 26, 2018 at 1:08 PM, Gauthier Feuillen
wrote:
> Is there a way to change this value without changing yarn-site.xml ?
No. Local dirs are defined by the NodeManager, and Spark cannot override them.
--
Marcelo
On Mon, Mar 26, 2018 at 11:01 AM, Fawze Abujaber wrote:
> Weird, I just ran spark-shell and it's log is comprised but my spark jobs
> that scheduled using oozie is not getting compressed.
Ah, then it's probably a problem with how Oozie is generating the
config for the Spark
/application_1522085988298_0002.snappy
On Mon, Mar 26, 2018 at 10:48 AM, Fawze Abujaber <fawz...@gmail.com> wrote:
> I distributed this config to all the nodes cross the cluster and with no
> success, new spark logs still uncompressed.
>
> On Mon, Mar 26, 2018 at 8:12 PM, M
ration
>
> On Mon, 26 Mar 2018 at 20:05 Marcelo Vanzin <van...@cloudera.com> wrote:
>>
>> If the spark-defaults.conf file in the machine where you're starting
>> the Spark app has that config, then that's all that should be needed.
>>
>> On Mon, Mar 26
ompressed but I don’t , do I
> need to perform restart to spark or Yarn?
>
> On Mon, 26 Mar 2018 at 19:53 Marcelo Vanzin <van...@cloudera.com> wrote:
>>
>> Log compression is a client setting. Doing that will make new apps
>> write event logs in compressed form
Log compression is a client setting. Doing that will make new apps
write event logs in compressed format.
The SHS doesn't compress existing logs.
On Mon, Mar 26, 2018 at 9:17 AM, Fawze Abujaber wrote:
> Hi All,
>
> I'm trying to compress the logs at SPark history server, i
They should be available in the current user.
UserGroupInformation.getCurrentUser().getCredentials()
On Wed, Mar 21, 2018 at 7:32 AM, Jorge Machado wrote:
> Hey spark group,
>
> I want to create a Delegation Token Provider for Accumulo I have One
> Question:
>
> How can I get the
>From spark-submit -h:
--files FILES Comma-separated list of files to be
placed in the working
directory of each executor. File paths of
these files
in executors can be accessed via
SparkFiles.get(fileName).
On Sun, Mar
ineBufferedStream: stdout: at
> javax.security.auth.Subject.doAs(Subject.java:422)
> 18/03/13 00:19:13 INFO LineBufferedStream: stdout: at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
> 18/03/13 00:19:13 INFO LineBufferedStream: stdout: at
> org.apache.hadoop.ipc.Server$Handler.run(S
That's not an error, just a warning. The docs [1] have more info about
the config options mentioned in that message.
[1] http://spark.apache.org/docs/latest/running-on-yarn.html
On Mon, Mar 12, 2018 at 4:42 PM, kant kodali wrote:
> Hi All,
>
> I am trying to use YARN for the
According to https://issues.apache.org/jira/browse/SPARK-19558 this
feature was added in 2.3.
On Fri, Feb 16, 2018 at 12:43 AM, kurian vs wrote:
> Hi,
>
> I was trying to create a custom Query execution listener by extending the
>
On Wed, Jan 3, 2018 at 8:18 PM, John Zhuge wrote:
> Something like:
>
> Note: When running Spark on YARN, environment variables for the executors
> need to be set using the spark.yarn.executorEnv.[EnvironmentVariableName]
> property in your conf/spark-defaults.conf file or
n?
>
>
> On Wed, Jan 3, 2018 at 9:59 AM, Marcelo Vanzin <van...@cloudera.com> wrote:
>>
>> On Tue, Jan 2, 2018 at 10:57 PM, John Zhuge <jzh...@apache.org> wrote:
>> > I am running Spark 2.0.0 and 2.1.1 on YARN in a Hadoop 2.7.3 cluster. Is
>&
On Tue, Jan 2, 2018 at 10:57 PM, John Zhuge wrote:
> I am running Spark 2.0.0 and 2.1.1 on YARN in a Hadoop 2.7.3 cluster. Is
> spark-env.sh sourced when starting the Spark AM container or the executor
> container?
No, it's not.
--
Marcelo
This sounds like something mapPartitions should be able to do, not
sure if there's an easier way.
On Thu, Dec 14, 2017 at 10:20 AM, Don Drake wrote:
> I'm looking for some advice when I have a flatMap on a Dataset that is
> creating and returning a sequence of a new case
On Wed, Dec 13, 2017 at 11:21 AM, Toy wrote:
> I'm wondering why am I seeing 5 attempts for my Spark application? Does Spark
> application restart itself?
It restarts itself if it fails (up to a limit that can be configured
either per Spark application or globally in
The closure in your "foreach" loop runs in a remote executor, no the
local JVM, so it's updating its own copy of the t-digest instance. The
one on the driver side is never touched.
On Sun, Dec 10, 2017 at 10:27 PM, Himasha de Silva wrote:
> Hi,
>
> I want to load a spark
That's the Spark Master's view of the application. I don't know
exactly what it means in the different run modes, I'm more familiar
with YARN. But I wouldn't be surprised if, as with others, it mostly
tracks the driver's state.
On Thu, Dec 7, 2017 at 12:06 PM, bsikander
On Thu, Dec 7, 2017 at 11:40 AM, bsikander wrote:
> For example, if an application wanted 4 executors
> (spark.executor.instances=4) but the spark cluster can only provide 1
> executor. This means that I will only receive 1 onExecutorAdded event. Will
> the application state
On Tue, Dec 5, 2017 at 12:43 PM, bsikander wrote:
> 2) If I use context.addSparkListener, I can customize the listener but then
> I miss the onApplicationStart event. Also, I don't know the Spark's logic to
> changing the state of application from WAITING -> RUNNING.
I'm not
SparkLauncher operates at a different layer than Spark applications.
It doesn't know about executors or driver or anything, just whether
the Spark application was started or not. So it doesn't work for your
case.
The best option for your case is to install a SparkListener and
monitor events. But
I'd recommend against using the built-in jars for a different version
of Hive. You don't need to build your own Spark; just set
spark.sql.hive.metastore.jars / spark.sql.hive.metastore.version (see
documentation).
On Thu, Nov 9, 2017 at 2:10 AM, yaooqinn wrote:
> Hi, all
>
You don't need to collect data in the driver to save it. The code in
the original question doesn't use "collect()", so it's actually doing
a distributed write.
On Mon, Oct 2, 2017 at 11:26 AM, JG Perrin wrote:
> Steve,
>
>
>
> If I refer to the collect() API, it says
Jars distributed using --jars are not added to the system classpath,
so log4j cannot see them.
To work around that, you need to manually add the *name* jar to the
driver executor classpaths:
spark.driver.extraClassPath=some.jar
spark.executor.extraClassPath=some.jar
In client mode you should
Hello,
This is a CDH-specific issue, please use the Cloudera forums / support
line instead of the Apache group.
On Thu, Jul 27, 2017 at 10:54 AM, Vikash Kumar
wrote:
> I have installed spark2 parcel through cloudera CDH 12.0. I see some issue
> there. Look like
On Wed, Jul 26, 2017 at 10:45 PM, satishl wrote:
> is this a supported scenario - i.e., can I run app compiled with spark 1.6
> on a 2.+ spark cluster?
In general, no.
--
Marcelo
-
To unsubscribe
On Mon, Jul 24, 2017 at 6:04 PM, Hyukjin Kwon wrote:
> However, I see some JIRAs are assigned to someone time to time. Were those
> mistakes or would you mind if I ask when someone is assigned?
I'm not sure if there are any guidelines of when to assign; since
there has been
We don't generally set assignees. Submit a PR on github and the PR
will be linked on JIRA; if your PR is submitted, then the bug is
assigned to you.
On Mon, Jul 24, 2017 at 5:57 PM, 萝卜丝炒饭 <1427357...@qq.com> wrote:
> Hi all,
> If I want to do some work about an issue registed in JIRA, how to set
On Fri, Jul 21, 2017 at 5:00 AM, Gokula Krishnan D wrote:
> Is there anyway can we setup the scheduler mode in Spark Cluster level
> besides application (SC level).
That's called the cluster (or resource) manager. e.g., configure
separate queues in YARN with a maximum number
Also, things seem to work with all your settings if you disable use of
the shuffle service (which also means no dynamic allocation), if that
helps you make progress in what you wanted to do.
On Thu, Jul 20, 2017 at 4:25 PM, Marcelo Vanzin <van...@cloudera.com> wrote:
> Hmm..
thing meaningful. Please find
> it attached. Can you please take a quick look, and let me know if you see
> anything suspicious ?
>
> If not, do you think I should open a JIRA for this ?
>
> Thanks !
>
> On Wed, Jul 19, 2017 at 3:14 PM, Marcelo Vanzin <van...@cloudera.com&g
y clue about this ?
>
>
> On Wed, Jul 19, 2017 at 1:13 PM, Marcelo Vanzin <van...@cloudera.com> wrote:
>>
>> On Wed, Jul 19, 2017 at 1:10 PM, Udit Mehrotra
>> <udit.mehrotr...@gmail.com> wrote:
>> > Is there any additional configuration I
On Wed, Jul 19, 2017 at 1:10 PM, Udit Mehrotra
wrote:
> Is there any additional configuration I need for external shuffle besides
> setting the following:
> spark.network.crypto.enabled true
> spark.network.crypto.saslFallback false
> spark.authenticate
ing else I am missing, or I can
> try differently ?
>
>
> Thanks !
>
>
> On Wed, Jul 19, 2017 at 12:03 PM, Marcelo Vanzin <van...@cloudera.com>
> wrote:
>>
>> Please include the list on your replies, so others can benefit from
>> the discussion too.
&
Please include the list on your replies, so others can benefit from
the discussion too.
On Wed, Jul 19, 2017 at 11:43 AM, Udit Mehrotra
wrote:
> Hi Marcelo,
>
> Thanks a lot for confirming that. Can you explain what you mean by upgrading
> the version of shuffle
On Wed, Jul 19, 2017 at 11:19 AM, Udit Mehrotra
wrote:
> spark.network.crypto.saslFallback false
> spark.authenticate true
>
> This seems to work fine with internal shuffle service of Spark. However,
> when in I try it with Yarn’s external shuffle service
On Tue, Jul 18, 2017 at 7:21 PM, Ivan Sadikov wrote:
> Repository that I linked to does not require rebuilding Spark and could be
> used with current distribution, which is preferable in my case.
Fair enough, although that means that you're re-implementing the Spark
UI,
See SPARK-18085. That has much of the same goals re: SHS resource
usage, and also provides a (currently non-public) API where you could
just create a MongoDB implementation if you want.
On Tue, Jul 18, 2017 at 12:56 AM, Ivan Sadikov wrote:
> Hello everyone!
>
> I have
itly disclaimed. The
> author will in no case be liable for any monetary damages arising from such
> loss, damage or destruction.
>
>
>
>
> On 17 July 2017 at 18:46, Marcelo Vanzin <van...@cloudera.com> wrote:
>>
>> The YARN backend distributes all files
...@gmail.com> wrote:
>>
>> Hi Mitch
>>
>> your jar file can be anywhere in the file system, including hdfs.
>>
>> If using yarn, preferably use cluster mode in terms of deployment.
>>
>> Yarn will distribute the jar to each container.
>>
>> Bes
Spark distributes your application jar for you.
On Mon, Jul 17, 2017 at 8:41 AM, Mich Talebzadeh
wrote:
> hi guys,
>
>
> an uber/fat jar file has been created to run with spark in CDH yarc client
> mode.
>
> As usual job is submitted to the edge node.
>
> does the jar
That thread looks like the connection between the Spark process and
jvisualvm. It's expected to show high up when doing sampling if the
app is not doing much else.
On Fri, Jun 23, 2017 at 10:46 AM, Reth RM wrote:
> Running a spark job on local machine and profiler results
On Sat, Jun 3, 2017 at 7:16 PM, Mohammad Tariq wrote:
> I am having a bit of difficulty in understanding the exact behaviour of
> SparkAppHandle.Listener.infoChanged(SparkAppHandle handle) method. The
> documentation says :
>
> Callback for changes in any information that is
On Thu, May 18, 2017 at 10:10 AM, Nipun Arora wrote:
> I wanted to know how to get the the input and output streams from
> SparkAppHandle?
You can't. You can redirect the output, but not directly get the streams.
--
Marcelo
scalastyle runs on the "verify" phase, which is after package but
before install.
On Wed, May 17, 2017 at 5:47 PM, yiskylee wrote:
> ./build/mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean
> package
> works, but
> ./build/mvn -Pyarn -Phadoop-2.4
http://spark.apache.org/docs/latest/configuration.html#shuffle-behavior
All the options you need to know are there.
On Fri, May 12, 2017 at 9:11 AM, Shashi Vishwakarma
wrote:
> Hi
>
> I was doing research on encrypting spark shuffle data and found that Spark
> 2.1 has
On Tue, May 2, 2017 at 9:07 AM, Nan Zhu wrote:
> I have no easy way to pass jar path to those forked Spark
> applications? (except that I download jar from a remote path to a local temp
> dir after resolving some permission issues, etc.?)
Yes, that's the only way
Remote jars are added to executors' classpaths, but not the driver's.
In YARN cluster mode, they would also be added to the driver's class
path.
On Tue, May 2, 2017 at 8:43 AM, Nan Zhu wrote:
> Hi, all
>
> For some reason, I tried to pass in a HDFS path to the --jars
amingContext() from my code in the
> previous email.
>
> On Wed, Apr 19, 2017 at 1:46 PM, Marcelo Vanzin <van...@cloudera.com> wrote:
>>
>> Why are you not using JavaStreamingContext if you're writing Java?
>>
>> On Wed, Apr 19, 2017 at 1:42 PM, kant kodali &
Why are you not using JavaStreamingContext if you're writing Java?
On Wed, Apr 19, 2017 at 1:42 PM, kant kodali wrote:
> Hi All,
>
> I get the following errors whichever way I try either lambda or generics. I
> am using
> spark 2.1 and scalla 2.11.8
>
>
> StreamingContext ssc
It's linked from the YARN RM's Web UI (see the "Application Master"
link for the running application).
On Mon, Mar 13, 2017 at 6:53 AM, Sourav Mazumder
wrote:
> Hi,
>
> Is there a way to monitor an ongoing Spark Job when running in Yarn Cluster
> mode ?
>
> In my
ot;success" : true
> }
> ./test3.sh: line 15: --num-decimals=1000: command not found
> ./test3.sh: line 16: --second-argument=Arg2: command not found
>
>
>
> From: Marcelo Vanzin <van...@cloudera.com>
> Sent: Tuesday, February 28, 2017 12:17:49 P
Everything after the jar path is passed to the main class as
parameters. So if it's not working you're probably doing something
wrong in your code (that you haven't posted).
On Tue, Feb 28, 2017 at 7:05 AM, Joe Olson wrote:
> For spark-submit, I know I can submit application
> none of my Config settings
Is it none of the configs or just the queue? You can't set the YARN
queue in cluster mode through code, it has to be set in the command
line. It's a chicken & egg problem (in cluster mode, the YARN app is
created before your code runs).
--property-file works the
Spark has never shaded dependencies (in the sense of renaming the classes),
with a couple of exceptions (Guava and Jetty). So that behavior is nothing
new. Spark's dependencies themselves have a lot of other dependencies, so
doing that would have limited benefits anyway.
On Tue, Jan 31, 2017 at
s I meant submitting through spark-submit.
>
> so If I do spark-submit A.jar and spark-submit A.jar again. Do I get two
> UI's or one UI'? and which ports do they run on when using the stand alone
> mode?
>
> On Mon, Jan 23, 2017 at 12:19 PM, Marcelo Vanzin <van...@cloudera.com&g
rote:
> hmm..I guess in that case my assumption of "app" is wrong. I thought the app
> is a client jar that you submit. no? If so, say I submit multiple jobs then
> I get two UI'S?
>
> On Mon, Jan 23, 2017 at 12:07 PM, Marcelo Vanzin <van...@cloudera.com>
> wrote
rk.apache.org/docs/latest/security.html#standalone-mode-only
>
> On Mon, Jan 23, 2017 at 11:51 AM, Marcelo Vanzin <van...@cloudera.com>
> wrote:
>>
>> That's the Master, whose default port is 8080 (not 4040). The default
>> port for the app's UI is 4040.
>>
That's the Master, whose default port is 8080 (not 4040). The default
port for the app's UI is 4040.
On Mon, Jan 23, 2017 at 11:47 AM, kant kodali wrote:
> I am not sure why Spark web UI keeps changing its port every time I restart
> a cluster? how can I make it run always on
(-dev, +user. dev is for Spark development, not for questions about
using Spark.)
You haven't posted code here or the actual error. But you might be
running into SPARK-15754. Or into other issues with yarn-client mode
and "--principal / --keytab" (those have known issues in client mode).
If you
1 - 100 of 482 matches
Mail list logo