Re: Issues with Apache Spark tgz file

2019-12-30 Thread Marcelo Vanzin
That first URL is not the file. It's a web page with links to the file in different mirrors. I just looked at the actual file in one of the mirrors and it looks fine. On Mon, Dec 30, 2019 at 1:34 PM rsinghania wrote: > > Hi, > > I'm trying to open the file >

Re: Is it possible to obtain the full command to be invoked by SparkLauncher?

2019-04-24 Thread Marcelo Vanzin
BTW the SparkLauncher API has hooks to capture the stderr of the spark-submit process into the logging system of the parent process. Check the API javadocs since it's been forever since I looked at that. On Wed, Apr 24, 2019 at 1:58 PM Marcelo Vanzin wrote: > > S

Re: Is it possible to obtain the full command to be invoked by SparkLauncher?

2019-04-24 Thread Marcelo Vanzin
Setting the SPARK_PRINT_LAUNCH_COMMAND env variable to 1 in the launcher env will make Spark code print the command to stderr. Not optimal but I think it's the only current option. On Wed, Apr 24, 2019 at 1:55 PM Jeff Evans wrote: > > The org.apache.spark.launcher.SparkLauncher is used to

Re: spark.submit.deployMode: cluster

2019-03-26 Thread Marcelo Vanzin
If you're not using spark-submit, then that option does nothing. If by "context creation API" you mean "new SparkContext()" or an equivalent, then you're explicitly creating the driver inside your application. On Tue, Mar 26, 2019 at 1:56 PM Pat Ferrel wrote: > > I have a server that starts a

Re: RPC timeout error for AES based encryption between driver and executor

2019-03-26 Thread Marcelo Vanzin
I don't think "spark.authenticate" works properly with k8s in 2.4 (which would make it impossible to enable encryption since it requires authentication). I'm pretty sure I fixed it in master, though. On Tue, Mar 26, 2019 at 2:29 AM Sinha, Breeta (Nokia - IN/Bangalore) wrote: > > Hi All, > > > >

Re: Multiple context in one Driver

2019-03-14 Thread Marcelo Vanzin
It doesn't work (except if you're extremely lucky), it will eat your lunch and will also kick your dog. And it's not even going to be an option in the next version of Spark. On Wed, Mar 13, 2019 at 11:38 PM Ido Friedman wrote: > > Hi, > > I am researching the use of multiple sparkcontext in one

Re: How to force-quit a Spark application?

2019-01-24 Thread Marcelo Vanzin
Hi, On Tue, Jan 22, 2019 at 11:30 AM Pola Yao wrote: > "Thread-1" #19 prio=5 os_prio=0 tid=0x7f9b6828e800 nid=0x77cb waiting on > condition [0x7f9a123e3000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for

Re: How to force-quit a Spark application?

2019-01-16 Thread Marcelo Vanzin
.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > > "VM Thread" os_p

Re: How to force-quit a Spark application?

2019-01-16 Thread Marcelo Vanzin
s to > exit the spark (e.g., System.exit()), but failed. Is there an explicit way to > shutdown all the alive threads in the spark application and then quit > afterwards? > > > On Tue, Jan 15, 2019 at 2:38 PM Marcelo Vanzin wrote: >> >> You should check the active thread

Re: How to force-quit a Spark application?

2019-01-15 Thread Marcelo Vanzin
You should check the active threads in your app. Since your pool uses non-daemon threads, that will prevent the app from exiting. spark.stop() should have stopped the Spark jobs in other threads, at least. But if something is blocking one of those threads, or if something is creating a non-daemon

Re: How to reissue a delegated token after max lifetime passes for a spark streaming application on a Kerberized cluster

2019-01-03 Thread Marcelo Vanzin
h “kms-dt”. > > > > Anyone knows why this is happening ? Any suggestion to make it working > with KMS ? > > > > Thanks > > > > > > > > [image: cid:image001.jpg@01D41D15.E01B6F00] > > *Paolo Platter* > > *CTO* > > E-mail:paolo.plat...@ag

Re: How to reissue a delegated token after max lifetime passes for a spark streaming application on a Kerberized cluster

2019-01-03 Thread Marcelo Vanzin
If you are using the principal / keytab params, Spark should create tokens as needed. If it's not, something else is going wrong, and only looking at full logs for the app would help. On Wed, Jan 2, 2019 at 5:09 PM Ali Nazemian wrote: > > Hi, > > We are using a headless keytab to run our

Re: Custom Metric Sink on Executor Always ClassNotFound

2018-12-20 Thread Marcelo Vanzin
First, it's really weird to use "org.apache.spark" for a class that is not in Spark. For executors, the jar file of the sink needs to be in the system classpath; the application jar is not in the system classpath, so that does not work. There are different ways for you to get it there, most of

Re: [ANNOUNCE] Announcing Apache Spark 2.4.0

2018-11-08 Thread Marcelo Vanzin
+user@ >> -- Forwarded message - >> From: Wenchen Fan >> Date: Thu, Nov 8, 2018 at 10:55 PM >> Subject: [ANNOUNCE] Announcing Apache Spark 2.4.0 >> To: Spark dev list >> >> >> Hi all, >> >> Apache Spark 2.4.0 is the fifth release in the 2.x line. This release adds >> Barrier

Re: [Spark UI] Spark 2.3.1 UI no longer respects spark.ui.retainedJobs

2018-10-25 Thread Marcelo Vanzin
production application continues to submit jobs every once in a while, > the issue persists. > > On Wed, Oct 24, 2018 at 5:05 PM Marcelo Vanzin wrote: >> >> When you say many jobs at once, what ballpark are you talking about? >> >> The code in 2.3+ does try to ke

Re: [Spark UI] Spark 2.3.1 UI no longer respects spark.ui.retainedJobs

2018-10-24 Thread Marcelo Vanzin
cala.concurrent._ > scala> import scala.concurrent.ExecutionContext.Implicits.global > scala> for (i <- 0 until 5) { Future { println(sc.parallelize(0 until > i).collect.length) } } > > On Mon, Oct 22, 2018 at 11:25 AM Marcelo Vanzin wrote: >> >> Just tried on 2.3.2 and worked fine f

Re: [Spark UI] Spark 2.3.1 UI no longer respects spark.ui.retainedJobs

2018-10-22 Thread Marcelo Vanzin
Just tried on 2.3.2 and worked fine for me. UI had a single job and a single stage (+ the tasks related to that single stage), same thing in memory (checked with jvisualvm). On Sat, Oct 20, 2018 at 6:45 PM Marcelo Vanzin wrote: > > On Tue, Oct 16, 2018 at 9:34 AM Patrick Brown > wro

Re: [Spark UI] Spark 2.3.1 UI no longer respects spark.ui.retainedJobs

2018-10-20 Thread Marcelo Vanzin
On Tue, Oct 16, 2018 at 9:34 AM Patrick Brown wrote: > I recently upgraded to spark 2.3.1 I have had these same settings in my spark > submit script, which worked on 2.0.2, and according to the documentation > appear to not have changed: > > spark.ui.retainedTasks=1 > spark.ui.retainedStages=1

Re: kerberos auth for MS SQL server jdbc driver

2018-10-15 Thread Marcelo Vanzin
Spark only does Kerberos authentication on the driver. For executors it currently only supports Hadoop's delegation tokens for Kerberos. To use something that does not support delegation tokens you have to manually manage the Kerberos login in your code that runs in executors, which might be

Re: Specifying different version of pyspark.zip and py4j files on worker nodes with Spark pre-installed

2018-10-05 Thread Marcelo Vanzin
k/blob/88e7e87bd5c052e10f52d4bb97a9d78f5b524128/core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala#L31 >> > >> > The code shows Spark will try to find the path if SPARK_HOME is specified. >> > And on my worker node, SPARK_HOME is specified in .bashrc , for the

Re: Specifying different version of pyspark.zip and py4j files on worker nodes with Spark pre-installed

2018-10-04 Thread Marcelo Vanzin
for the > pre-installed 2.2.1 path. > > I don't want to make any changes to worker node configuration, so any way to > override the order? > > Jianshi > > On Fri, Oct 5, 2018 at 12:11 AM Marcelo Vanzin wrote: >> >> Normally the version of Spark installed on the

Re: Specifying different version of pyspark.zip and py4j files on worker nodes with Spark pre-installed

2018-10-04 Thread Marcelo Vanzin
Normally the version of Spark installed on the cluster does not matter, since Spark is uploaded from your gateway machine to YARN by default. You probably have some configuration (in spark-defaults.conf) that tells YARN to use a cached copy. Get rid of that configuration, and you can use whatever

Re: deploy-mode cluster. FileNotFoundException

2018-09-05 Thread Marcelo Vanzin
See SPARK-4160. Long story short: you need to upload the files and jars to some shared storage (like HDFS) manually. On Wed, Sep 5, 2018 at 2:17 AM Guillermo Ortiz Fernández wrote: > > I'm using standalone cluster and the final command I'm trying is: > spark-submit --verbose --deploy-mode cluster

Re: Issue upgrading to Spark 2.3.1 (Maintenance Release)

2018-06-15 Thread Marcelo Vanzin
I'm not familiar with PyCharm. But if you can run "pyspark" from the command line and not hit this, then this might be an issue with PyCharm or your environment - e.g. having an old version of the pyspark code around, or maybe PyCharm itself might need to be updated. On Thu, Jun 14, 2018 at 10:01

Re: Spark user classpath setting

2018-06-14 Thread Marcelo Vanzin
I only know of a way to do that with YARN. You can distribute the jar files using "--files" and add just their names (not the full path) to the "extraClassPath" configs. You don't need "userClassPathFirst" in that case. On Thu, Jun 14, 2018 at 1:28 PM, Arjun kr wrote: > Hi All, > > > I am

[ANNOUNCE] Announcing Apache Spark 2.3.1

2018-06-11 Thread Marcelo Vanzin
We are happy to announce the availability of Spark 2.3.1! Apache Spark 2.3.1 is a maintenance release, based on the branch-2.3 maintenance branch of Spark. We strongly recommend all 2.3.x users to upgrade to this stable release. To download Spark 2.3.1, head over to the download page:

Re: [SparkLauncher] stateChanged event not received in standalone cluster mode

2018-06-06 Thread Marcelo Vanzin
That feature has not been implemented yet. https://issues.apache.org/jira/browse/SPARK-11033 On Wed, Jun 6, 2018 at 5:18 AM, Behroz Sikander wrote: > I have a client application which launches multiple jobs in Spark Cluster > using SparkLauncher. I am using Standalone cluster mode. Launching

Re: Submit many spark applications

2018-05-25 Thread Marcelo Vanzin
I already gave my recommendation in my very first reply to this thread... On Fri, May 25, 2018 at 10:23 AM, raksja wrote: > ok, when to use what? > do you have any recommendation? > > > > -- > Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ > >

Re: Submit many spark applications

2018-05-25 Thread Marcelo Vanzin
On Fri, May 25, 2018 at 10:18 AM, raksja wrote: > InProcessLauncher would just start a subprocess as you mentioned earlier. No. As the name says, it runs things in the same process. -- Marcelo - To

Re: Submit many spark applications

2018-05-25 Thread Marcelo Vanzin
That's what Spark uses. On Fri, May 25, 2018 at 10:09 AM, raksja wrote: > thanks for the reply. > > Have you tried submit a spark job directly to Yarn using YarnClient. > https://hadoop.apache.org/docs/r2.6.0/api/org/apache/hadoop/yarn/client/api/YarnClient.html > > Not

Re: Submit many spark applications

2018-05-23 Thread Marcelo Vanzin
On Wed, May 23, 2018 at 12:04 PM, raksja wrote: > So InProcessLauncher wouldnt use the native memory, so will it overload the > mem of parent process? I will still use "native memory" (since the parent process will still use memory), just less of it. But yes, it will use

Re: Encounter 'Could not find or load main class' error when submitting spark job on kubernetes

2018-05-22 Thread Marcelo Vanzin
On Tue, May 22, 2018 at 12:45 AM, Makoto Hashimoto wrote: > local:///usr/local/oss/spark-2.3.0-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.3.0.jar Is that the path of the jar inside your docker image? The default image puts that in /opt/spark IIRC. -- Marcelo

Re: Submit many spark applications

2018-05-16 Thread Marcelo Vanzin
You can either: - set spark.yarn.submit.waitAppCompletion=false, which will make spark-submit go away once the app starts in cluster mode. - use the (new in 2.3) InProcessLauncher class + some custom Java code to submit all the apps from the same "launcher" process. On Wed, May 16, 2018 at 1:45

Re: Spark UI Source Code

2018-05-09 Thread Marcelo Vanzin
KVStore library of spark). Is there a way to fetch data from this > KVStore (which uses levelDb for storage) and filter it on basis on > timestamp? > > Thanks, > Anshi > > On Mon, May 7, 2018 at 9:51 PM, Marcelo Vanzin [via Apache Spark User List] > <ml+s1001560n32114...@n3.n

Re: Guava dependency issue

2018-05-08 Thread Marcelo Vanzin
Using a custom Guava version with Spark is not that simple. Spark shades Guava, but a lot of libraries Spark uses do not - the main one being all of the Hadoop ones, and they need a quite old Guava. So you have two options: shade/relocate Guava in your application, or use

Re: Spark UI Source Code

2018-05-07 Thread Marcelo Vanzin
On Mon, May 7, 2018 at 1:44 AM, Anshi Shrivastava wrote: > I've found a KVStore wrapper which stores all the metrics in a LevelDb > store. This KVStore wrapper is available as a spark-dependency but we cannot > access the metrics directly from spark since they are

Re: Spark launcher listener not getting invoked k8s Spark 2.3

2018-04-30 Thread Marcelo Vanzin
plication is running on > k8 but listener is not getting invoked > > > On Monday, April 30, 2018, Marcelo Vanzin <van...@cloudera.com> wrote: >> >> I'm pretty sure this feature hasn't been implemented for the k8s backend. >> >> On Mon, Apr 30, 2018 at 4:51 PM, pur

Re: Spark LOCAL mode and external jar (extraClassPath)

2018-04-13 Thread Marcelo Vanzin
There are two things you're doing wrong here: On Thu, Apr 12, 2018 at 6:32 PM, jb44 wrote: > Then I can add the alluxio client library like so: > sparkSession.conf.set("spark.driver.extraClassPath", ALLUXIO_SPARK_CLIENT) First one, you can't modify JVM configuration after it

Re: Spark on Kubernetes (minikube) 2.3 fails with class not found exception

2018-04-10 Thread Marcelo Vanzin
This is the problem: > :/opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar;/opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar Seems like some code is confusing things when mixing OSes. It's using the Windows separator when building a command line ti be run on a Linux host. On Tue, Apr

Re: all spark settings end up being system properties

2018-03-30 Thread Marcelo Vanzin
Why: it's part historical, part "how else would you do it". SparkConf needs to read properties read from the command line, but SparkConf is something that user code instantiates, so we can't easily make it read data from arbitrary locations. You could use thread locals and other tricks, but user

Re: Local dirs

2018-03-26 Thread Marcelo Vanzin
On Mon, Mar 26, 2018 at 1:08 PM, Gauthier Feuillen wrote: > Is there a way to change this value without changing yarn-site.xml ? No. Local dirs are defined by the NodeManager, and Spark cannot override them. -- Marcelo

Re: Spark logs compression

2018-03-26 Thread Marcelo Vanzin
On Mon, Mar 26, 2018 at 11:01 AM, Fawze Abujaber wrote: > Weird, I just ran spark-shell and it's log is comprised but my spark jobs > that scheduled using oozie is not getting compressed. Ah, then it's probably a problem with how Oozie is generating the config for the Spark

Re: Spark logs compression

2018-03-26 Thread Marcelo Vanzin
/application_1522085988298_0002.snappy On Mon, Mar 26, 2018 at 10:48 AM, Fawze Abujaber <fawz...@gmail.com> wrote: > I distributed this config to all the nodes cross the cluster and with no > success, new spark logs still uncompressed. > > On Mon, Mar 26, 2018 at 8:12 PM, M

Re: Spark logs compression

2018-03-26 Thread Marcelo Vanzin
ration > > On Mon, 26 Mar 2018 at 20:05 Marcelo Vanzin <van...@cloudera.com> wrote: >> >> If the spark-defaults.conf file in the machine where you're starting >> the Spark app has that config, then that's all that should be needed. >> >> On Mon, Mar 26

Re: Spark logs compression

2018-03-26 Thread Marcelo Vanzin
ompressed but I don’t , do I > need to perform restart to spark or Yarn? > > On Mon, 26 Mar 2018 at 19:53 Marcelo Vanzin <van...@cloudera.com> wrote: >> >> Log compression is a client setting. Doing that will make new apps >> write event logs in compressed form

Re: Spark logs compression

2018-03-26 Thread Marcelo Vanzin
Log compression is a client setting. Doing that will make new apps write event logs in compressed format. The SHS doesn't compress existing logs. On Mon, Mar 26, 2018 at 9:17 AM, Fawze Abujaber wrote: > Hi All, > > I'm trying to compress the logs at SPark history server, i

Re: HadoopDelegationTokenProvider

2018-03-21 Thread Marcelo Vanzin
They should be available in the current user. UserGroupInformation.getCurrentUser().getCredentials() On Wed, Mar 21, 2018 at 7:32 AM, Jorge Machado wrote: > Hey spark group, > > I want to create a Delegation Token Provider for Accumulo I have One > Question: > > How can I get the

Re: Accessing a file that was passed via --files to spark submit

2018-03-19 Thread Marcelo Vanzin
>From spark-submit -h: --files FILES Comma-separated list of files to be placed in the working directory of each executor. File paths of these files in executors can be accessed via SparkFiles.get(fileName). On Sun, Mar

Re: How to run spark shell using YARN

2018-03-12 Thread Marcelo Vanzin
ineBufferedStream: stdout: at > javax.security.auth.Subject.doAs(Subject.java:422) > 18/03/13 00:19:13 INFO LineBufferedStream: stdout: at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) > 18/03/13 00:19:13 INFO LineBufferedStream: stdout: at > org.apache.hadoop.ipc.Server$Handler.run(S

Re: How to run spark shell using YARN

2018-03-12 Thread Marcelo Vanzin
That's not an error, just a warning. The docs [1] have more info about the config options mentioned in that message. [1] http://spark.apache.org/docs/latest/running-on-yarn.html On Mon, Mar 12, 2018 at 4:42 PM, kant kodali wrote: > Hi All, > > I am trying to use YARN for the

Re: [spark-sql] Custom Query Execution listener via conf properties

2018-02-16 Thread Marcelo Vanzin
According to https://issues.apache.org/jira/browse/SPARK-19558 this feature was added in 2.3. On Fri, Feb 16, 2018 at 12:43 AM, kurian vs wrote: > Hi, > > I was trying to create a custom Query execution listener by extending the >

Re: Is spark-env.sh sourced by Application Master and Executor for Spark on YARN?

2018-01-04 Thread Marcelo Vanzin
On Wed, Jan 3, 2018 at 8:18 PM, John Zhuge wrote: > Something like: > > Note: When running Spark on YARN, environment variables for the executors > need to be set using the spark.yarn.executorEnv.[EnvironmentVariableName] > property in your conf/spark-defaults.conf file or

Re: Is spark-env.sh sourced by Application Master and Executor for Spark on YARN?

2018-01-03 Thread Marcelo Vanzin
n? > > > On Wed, Jan 3, 2018 at 9:59 AM, Marcelo Vanzin <van...@cloudera.com> wrote: >> >> On Tue, Jan 2, 2018 at 10:57 PM, John Zhuge <jzh...@apache.org> wrote: >> > I am running Spark 2.0.0 and 2.1.1 on YARN in a Hadoop 2.7.3 cluster. Is >&

Re: Is spark-env.sh sourced by Application Master and Executor for Spark on YARN?

2018-01-03 Thread Marcelo Vanzin
On Tue, Jan 2, 2018 at 10:57 PM, John Zhuge wrote: > I am running Spark 2.0.0 and 2.1.1 on YARN in a Hadoop 2.7.3 cluster. Is > spark-env.sh sourced when starting the Spark AM container or the executor > container? No, it's not. -- Marcelo

Re: flatMap() returning large class

2017-12-14 Thread Marcelo Vanzin
This sounds like something mapPartitions should be able to do, not sure if there's an easier way. On Thu, Dec 14, 2017 at 10:20 AM, Don Drake wrote: > I'm looking for some advice when I have a flatMap on a Dataset that is > creating and returning a sequence of a new case

Re: Why do I see five attempts on my Spark application

2017-12-13 Thread Marcelo Vanzin
On Wed, Dec 13, 2017 at 11:21 AM, Toy wrote: > I'm wondering why am I seeing 5 attempts for my Spark application? Does Spark > application restart itself? It restarts itself if it fails (up to a limit that can be configured either per Spark application or globally in

Re: Loading a spark dataframe column into T-Digest using java

2017-12-11 Thread Marcelo Vanzin
The closure in your "foreach" loop runs in a remote executor, no the local JVM, so it's updating its own copy of the t-digest instance. The one on the driver side is never touched. On Sun, Dec 10, 2017 at 10:27 PM, Himasha de Silva wrote: > Hi, > > I want to load a spark

Re: Programmatically get status of job (WAITING/RUNNING)

2017-12-07 Thread Marcelo Vanzin
That's the Spark Master's view of the application. I don't know exactly what it means in the different run modes, I'm more familiar with YARN. But I wouldn't be surprised if, as with others, it mostly tracks the driver's state. On Thu, Dec 7, 2017 at 12:06 PM, bsikander

Re: Programmatically get status of job (WAITING/RUNNING)

2017-12-07 Thread Marcelo Vanzin
On Thu, Dec 7, 2017 at 11:40 AM, bsikander wrote: > For example, if an application wanted 4 executors > (spark.executor.instances=4) but the spark cluster can only provide 1 > executor. This means that I will only receive 1 onExecutorAdded event. Will > the application state

Re: Programmatically get status of job (WAITING/RUNNING)

2017-12-05 Thread Marcelo Vanzin
On Tue, Dec 5, 2017 at 12:43 PM, bsikander wrote: > 2) If I use context.addSparkListener, I can customize the listener but then > I miss the onApplicationStart event. Also, I don't know the Spark's logic to > changing the state of application from WAITING -> RUNNING. I'm not

Re: Programmatically get status of job (WAITING/RUNNING)

2017-12-05 Thread Marcelo Vanzin
SparkLauncher operates at a different layer than Spark applications. It doesn't know about executors or driver or anything, just whether the Spark application was started or not. So it doesn't work for your case. The best option for your case is to install a SparkListener and monitor events. But

Re: Does the builtin hive jars talk of spark to HiveMetaStore(2.1) without any issues?

2017-11-09 Thread Marcelo Vanzin
I'd recommend against using the built-in jars for a different version of Hive. You don't need to build your own Spark; just set spark.sql.hive.metastore.jars / spark.sql.hive.metastore.version (see documentation). On Thu, Nov 9, 2017 at 2:10 AM, yaooqinn wrote: > Hi, all >

Re: HDFS or NFS as a cache?

2017-10-02 Thread Marcelo Vanzin
You don't need to collect data in the driver to save it. The code in the original question doesn't use "collect()", so it's actually doing a distributed write. On Mon, Oct 2, 2017 at 11:26 AM, JG Perrin wrote: > Steve, > > > > If I refer to the collect() API, it says

Re: --jars from spark-submit on master on YARN don't get added properly to the executors - ClassNotFoundException

2017-08-09 Thread Marcelo Vanzin
Jars distributed using --jars are not added to the system classpath, so log4j cannot see them. To work around that, you need to manually add the *name* jar to the driver executor classpaths: spark.driver.extraClassPath=some.jar spark.executor.extraClassPath=some.jar In client mode you should

Re: Spark2.1 installation issue

2017-07-27 Thread Marcelo Vanzin
Hello, This is a CDH-specific issue, please use the Cloudera forums / support line instead of the Apache group. On Thu, Jul 27, 2017 at 10:54 AM, Vikash Kumar wrote: > I have installed spark2 parcel through cloudera CDH 12.0. I see some issue > there. Look like

Re: running spark application compiled with 1.6 on spark 2.1 cluster

2017-07-27 Thread Marcelo Vanzin
On Wed, Jul 26, 2017 at 10:45 PM, satishl wrote: > is this a supported scenario - i.e., can I run app compiled with spark 1.6 > on a 2.+ spark cluster? In general, no. -- Marcelo - To unsubscribe

Re: how to set the assignee in JIRA please?

2017-07-24 Thread Marcelo Vanzin
On Mon, Jul 24, 2017 at 6:04 PM, Hyukjin Kwon wrote: > However, I see some JIRAs are assigned to someone time to time. Were those > mistakes or would you mind if I ask when someone is assigned? I'm not sure if there are any guidelines of when to assign; since there has been

Re: how to set the assignee in JIRA please?

2017-07-24 Thread Marcelo Vanzin
We don't generally set assignees. Submit a PR on github and the PR will be linked on JIRA; if your PR is submitted, then the bug is assigned to you. On Mon, Jul 24, 2017 at 5:57 PM, 萝卜丝炒饭 <1427357...@qq.com> wrote: > Hi all, > If I want to do some work about an issue registed in JIRA, how to set

Re: Spark on Cloudera Configuration (Scheduler Mode = FAIR)

2017-07-21 Thread Marcelo Vanzin
On Fri, Jul 21, 2017 at 5:00 AM, Gokula Krishnan D wrote: > Is there anyway can we setup the scheduler mode in Spark Cluster level > besides application (SC level). That's called the cluster (or resource) manager. e.g., configure separate queues in YARN with a maximum number

Re: Question regarding Sparks new Internal authentication mechanism

2017-07-20 Thread Marcelo Vanzin
Also, things seem to work with all your settings if you disable use of the shuffle service (which also means no dynamic allocation), if that helps you make progress in what you wanted to do. On Thu, Jul 20, 2017 at 4:25 PM, Marcelo Vanzin <van...@cloudera.com> wrote: > Hmm..

Re: Question regarding Sparks new Internal authentication mechanism

2017-07-20 Thread Marcelo Vanzin
thing meaningful. Please find > it attached. Can you please take a quick look, and let me know if you see > anything suspicious ? > > If not, do you think I should open a JIRA for this ? > > Thanks ! > > On Wed, Jul 19, 2017 at 3:14 PM, Marcelo Vanzin <van...@cloudera.com&g

Re: Question regarding Sparks new Internal authentication mechanism

2017-07-19 Thread Marcelo Vanzin
y clue about this ? > > > On Wed, Jul 19, 2017 at 1:13 PM, Marcelo Vanzin <van...@cloudera.com> wrote: >> >> On Wed, Jul 19, 2017 at 1:10 PM, Udit Mehrotra >> <udit.mehrotr...@gmail.com> wrote: >> > Is there any additional configuration I

Re: Question regarding Sparks new Internal authentication mechanism

2017-07-19 Thread Marcelo Vanzin
On Wed, Jul 19, 2017 at 1:10 PM, Udit Mehrotra wrote: > Is there any additional configuration I need for external shuffle besides > setting the following: > spark.network.crypto.enabled true > spark.network.crypto.saslFallback false > spark.authenticate

Re: Question regarding Sparks new Internal authentication mechanism

2017-07-19 Thread Marcelo Vanzin
ing else I am missing, or I can > try differently ? > > > Thanks ! > > > On Wed, Jul 19, 2017 at 12:03 PM, Marcelo Vanzin <van...@cloudera.com> > wrote: >> >> Please include the list on your replies, so others can benefit from >> the discussion too. &

Re: Question regarding Sparks new Internal authentication mechanism

2017-07-19 Thread Marcelo Vanzin
Please include the list on your replies, so others can benefit from the discussion too. On Wed, Jul 19, 2017 at 11:43 AM, Udit Mehrotra wrote: > Hi Marcelo, > > Thanks a lot for confirming that. Can you explain what you mean by upgrading > the version of shuffle

Re: Question regarding Sparks new Internal authentication mechanism

2017-07-19 Thread Marcelo Vanzin
On Wed, Jul 19, 2017 at 11:19 AM, Udit Mehrotra wrote: > spark.network.crypto.saslFallback false > spark.authenticate true > > This seems to work fine with internal shuffle service of Spark. However, > when in I try it with Yarn’s external shuffle service

Re: Spark history server running on Mongo

2017-07-19 Thread Marcelo Vanzin
On Tue, Jul 18, 2017 at 7:21 PM, Ivan Sadikov wrote: > Repository that I linked to does not require rebuilding Spark and could be > used with current distribution, which is preferable in my case. Fair enough, although that means that you're re-implementing the Spark UI,

Re: Spark history server running on Mongo

2017-07-18 Thread Marcelo Vanzin
See SPARK-18085. That has much of the same goals re: SHS resource usage, and also provides a (currently non-public) API where you could just create a MongoDB implementation if you want. On Tue, Jul 18, 2017 at 12:56 AM, Ivan Sadikov wrote: > Hello everyone! > > I have

Re: running spark job with fat jar file

2017-07-17 Thread Marcelo Vanzin
itly disclaimed. The > author will in no case be liable for any monetary damages arising from such > loss, damage or destruction. > > > > > On 17 July 2017 at 18:46, Marcelo Vanzin <van...@cloudera.com> wrote: >> >> The YARN backend distributes all files

Re: running spark job with fat jar file

2017-07-17 Thread Marcelo Vanzin
...@gmail.com> wrote: >> >> Hi Mitch >> >> your jar file can be anywhere in the file system, including hdfs. >> >> If using yarn, preferably use cluster mode in terms of deployment. >> >> Yarn will distribute the jar to each container. >> >> Bes

Re: running spark job with fat jar file

2017-07-17 Thread Marcelo Vanzin
Spark distributes your application jar for you. On Mon, Jul 17, 2017 at 8:41 AM, Mich Talebzadeh wrote: > hi guys, > > > an uber/fat jar file has been created to run with spark in CDH yarc client > mode. > > As usual job is submitted to the edge node. > > does the jar

Re: Spark job profiler results showing high TCP cpu time

2017-06-23 Thread Marcelo Vanzin
That thread looks like the connection between the Spark process and jvisualvm. It's expected to show high up when doing sampling if the app is not doing much else. On Fri, Jun 23, 2017 at 10:46 AM, Reth RM wrote: > Running a spark job on local machine and profiler results

Re: SparkAppHandle.Listener.infoChanged behaviour

2017-06-04 Thread Marcelo Vanzin
On Sat, Jun 3, 2017 at 7:16 PM, Mohammad Tariq wrote: > I am having a bit of difficulty in understanding the exact behaviour of > SparkAppHandle.Listener.infoChanged(SparkAppHandle handle) method. The > documentation says : > > Callback for changes in any information that is

Re: SparkAppHandle - get Input and output streams

2017-05-18 Thread Marcelo Vanzin
On Thu, May 18, 2017 at 10:10 AM, Nipun Arora wrote: > I wanted to know how to get the the input and output streams from > SparkAppHandle? You can't. You can redirect the output, but not directly get the streams. -- Marcelo

Re: scalastyle violation on mvn install but not on mvn package

2017-05-17 Thread Marcelo Vanzin
scalastyle runs on the "verify" phase, which is after package but before install. On Wed, May 17, 2017 at 5:47 PM, yiskylee wrote: > ./build/mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean > package > works, but > ./build/mvn -Pyarn -Phadoop-2.4

Re: Spark Shuffle Encryption

2017-05-12 Thread Marcelo Vanzin
http://spark.apache.org/docs/latest/configuration.html#shuffle-behavior All the options you need to know are there. On Fri, May 12, 2017 at 9:11 AM, Shashi Vishwakarma wrote: > Hi > > I was doing research on encrypting spark shuffle data and found that Spark > 2.1 has

Re: --jars does not take remote jar?

2017-05-02 Thread Marcelo Vanzin
On Tue, May 2, 2017 at 9:07 AM, Nan Zhu wrote: > I have no easy way to pass jar path to those forked Spark > applications? (except that I download jar from a remote path to a local temp > dir after resolving some permission issues, etc.?) Yes, that's the only way

Re: --jars does not take remote jar?

2017-05-02 Thread Marcelo Vanzin
Remote jars are added to executors' classpaths, but not the driver's. In YARN cluster mode, they would also be added to the driver's class path. On Tue, May 2, 2017 at 8:43 AM, Nan Zhu wrote: > Hi, all > > For some reason, I tried to pass in a HDFS path to the --jars

Re: Problem with Java and Scala interoperability // streaming

2017-04-19 Thread Marcelo Vanzin
amingContext() from my code in the > previous email. > > On Wed, Apr 19, 2017 at 1:46 PM, Marcelo Vanzin <van...@cloudera.com> wrote: >> >> Why are you not using JavaStreamingContext if you're writing Java? >> >> On Wed, Apr 19, 2017 at 1:42 PM, kant kodali &

Re: Problem with Java and Scala interoperability // streaming

2017-04-19 Thread Marcelo Vanzin
Why are you not using JavaStreamingContext if you're writing Java? On Wed, Apr 19, 2017 at 1:42 PM, kant kodali wrote: > Hi All, > > I get the following errors whichever way I try either lambda or generics. I > am using > spark 2.1 and scalla 2.11.8 > > > StreamingContext ssc

Re: Monitoring ongoing Spark Job when run in Yarn Cluster mode

2017-03-13 Thread Marcelo Vanzin
It's linked from the YARN RM's Web UI (see the "Application Master" link for the running application). On Mon, Mar 13, 2017 at 6:53 AM, Sourav Mazumder wrote: > Hi, > > Is there a way to monitor an ongoing Spark Job when running in Yarn Cluster > mode ? > > In my

Re: spark-submit question

2017-02-28 Thread Marcelo Vanzin
ot;success" : true > } > ./test3.sh: line 15: --num-decimals=1000: command not found > ./test3.sh: line 16: --second-argument=Arg2: command not found > > > > From: Marcelo Vanzin <van...@cloudera.com> > Sent: Tuesday, February 28, 2017 12:17:49 P

Re: spark-submit question

2017-02-28 Thread Marcelo Vanzin
Everything after the jar path is passed to the main class as parameters. So if it's not working you're probably doing something wrong in your code (that you haven't posted). On Tue, Feb 28, 2017 at 7:05 AM, Joe Olson wrote: > For spark-submit, I know I can submit application

Re: SPark - YARN Cluster Mode

2017-02-27 Thread Marcelo Vanzin
> none of my Config settings Is it none of the configs or just the queue? You can't set the YARN queue in cluster mode through code, it has to be set in the command line. It's a chicken & egg problem (in cluster mode, the YARN app is created before your code runs). --property-file works the

Re: Jars directory in Spark 2.0

2017-02-01 Thread Marcelo Vanzin
Spark has never shaded dependencies (in the sense of renaming the classes), with a couple of exceptions (Guava and Jetty). So that behavior is nothing new. Spark's dependencies themselves have a lot of other dependencies, so doing that would have limited benefits anyway. On Tue, Jan 31, 2017 at

Re: why does spark web UI keeps changing its port?

2017-01-23 Thread Marcelo Vanzin
s I meant submitting through spark-submit. > > so If I do spark-submit A.jar and spark-submit A.jar again. Do I get two > UI's or one UI'? and which ports do they run on when using the stand alone > mode? > > On Mon, Jan 23, 2017 at 12:19 PM, Marcelo Vanzin <van...@cloudera.com&g

Re: why does spark web UI keeps changing its port?

2017-01-23 Thread Marcelo Vanzin
rote: > hmm..I guess in that case my assumption of "app" is wrong. I thought the app > is a client jar that you submit. no? If so, say I submit multiple jobs then > I get two UI'S? > > On Mon, Jan 23, 2017 at 12:07 PM, Marcelo Vanzin <van...@cloudera.com> > wrote

Re: why does spark web UI keeps changing its port?

2017-01-23 Thread Marcelo Vanzin
rk.apache.org/docs/latest/security.html#standalone-mode-only > > On Mon, Jan 23, 2017 at 11:51 AM, Marcelo Vanzin <van...@cloudera.com> > wrote: >> >> That's the Master, whose default port is 8080 (not 4040). The default >> port for the app's UI is 4040. >>

Re: why does spark web UI keeps changing its port?

2017-01-23 Thread Marcelo Vanzin
That's the Master, whose default port is 8080 (not 4040). The default port for the app's UI is 4040. On Mon, Jan 23, 2017 at 11:47 AM, kant kodali wrote: > I am not sure why Spark web UI keeps changing its port every time I restart > a cluster? how can I make it run always on

Re: Is restarting of SparkContext allowed?

2016-12-15 Thread Marcelo Vanzin
(-dev, +user. dev is for Spark development, not for questions about using Spark.) You haven't posted code here or the actual error. But you might be running into SPARK-15754. Or into other issues with yarn-client mode and "--principal / --keytab" (those have known issues in client mode). If you

  1   2   3   4   5   >