Re: Python to Scala

2016-06-17 Thread Aakash Basu
I don't have a sound knowledge in Python and on the other hand we are working on Spark on Scala, so I don't think it will be allowed to run PySpark along with it, so the requirement is to convert the code to scala and use it. But I'm finding it difficult. Did not find a better forum for help than

Re: Python to Scala

2016-06-17 Thread Stephen Boesch
What are you expecting us to do? Yash provided a reasonable approach - based on the info you had provided in prior emails. Otherwise you can convert it from python to spark - or find someone else who feels comfortable to do it. That kind of inquiry would likelybe appropriate on a job board.

Re: Python to Scala

2016-06-17 Thread Aakash Basu
Hey, Our complete project is in Spark on Scala, I code in Scala for Spark, though am new, but I know it and still learning. But I need help in converting this code to Scala. I've nearly no knowledge in Python, hence, requested the experts here. Hope you get me now. Thanks, Aakash. On

Re: Python to Scala

2016-06-17 Thread Yash Sharma
You could use pyspark to run the python code on spark directly. That will cut the effort of learning scala. https://spark.apache.org/docs/0.9.0/python-programming-guide.html - Thanks, via mobile, excuse brevity. On Jun 18, 2016 2:34 PM, "Aakash Basu" wrote: > Hi all, > >

Python to Scala

2016-06-17 Thread Aakash Basu
Hi all, I've a python code, which I want to convert to Scala for using it in a Spark program. I'm not so well acquainted with python and learning scala now. Any Python+Scala expert here? Can someone help me out in this please? Thanks & Regards, Aakash.

Re: Skew data

2016-06-17 Thread Pedro Rodriguez
I am going to take a guess that this means that your partitions within an RDD are not balanced (one or more partitions are much larger than the rest). This would mean a single core would need to do much more work than the rest leading to poor performance. In general, the way to fix this is to

Re: Dataset Select Function after Aggregate Error

2016-06-17 Thread Pedro Rodriguez
Thanks Xinh and Takeshi, I am trying to avoid map since my impression is that this uses a Scala closure so is not optimized as well as doing column-wise operations is. Looks like the $ notation is the way to go, thanks for the help. Is there an explanation of how this works? I imagine it is a

Re: Dataset Select Function after Aggregate Error

2016-06-17 Thread Takeshi Yamamuro
Hi, In 2.0, you can say; val ds = Seq[Tuple2[Int, Int]]((1, 0), (2, 0)).toDS ds.groupBy($"_1").count.select($"_1", $"count").show // maropu On Sat, Jun 18, 2016 at 7:53 AM, Xinh Huynh wrote: > Hi Pedro, > > In 1.6.1, you can do: > >> ds.groupBy(_.uid).count().map(_._1)

Spark 2.0 on YARN - Files in config archive not ending up on executor classpath

2016-06-17 Thread Jonathan Kelly
I'm trying to debug a problem in Spark 2.0.0-SNAPSHOT (commit bdf5fe4143e5a1a393d97d0030e76d35791ee248) where Spark's log4j.properties is not getting picked up in the executor classpath (and driver classpath for yarn-cluster mode), so Hadoop's log4j.properties file is taking precedence in the YARN

Spark 2.0 preview - How to configure warehouse for Catalyst? always pointing to /user/hive/warehouse

2016-06-17 Thread Andrew Lee
>From branch-2.0, Spark 2.0.0 preview, I found it interesting, no matter what you do by configuring spark.sql.warehouse.dir it will always pull up the default path which is /user/hive/warehouse In the code, I notice that at LOC45

Re: Dataset Select Function after Aggregate Error

2016-06-17 Thread Xinh Huynh
Hi Pedro, In 1.6.1, you can do: >> ds.groupBy(_.uid).count().map(_._1) or >> ds.groupBy(_.uid).count().select($"value".as[String]) It doesn't have the exact same syntax as for DataFrame. http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Dataset It might be different

Dataset Select Function after Aggregate Error

2016-06-17 Thread Pedro Rodriguez
Hi All, I am working on using Datasets in 1.6.1 and eventually 2.0 when its released. I am running the aggregate code below where I have a dataset where the row has a field uid: ds.groupBy(_.uid).count() // res0: org.apache.spark.sql.Dataset[(String, Long)] = [_1: string, _2: bigint] This

Re: Best way to go from RDD to DataFrame of StringType columns

2016-06-17 Thread Jason
We do the exact same approach you proposed for converting horrible text formats (VCF in the bioinformatics domain) into DataFrames. This involves creating the schema dynamically based on the header of the file too. It's simple and easy, but if you need something higher performance you might need

Data Integrity / Model Quality Monitoring

2016-06-17 Thread Benjamin Kim
Has anyone run into this requirement? We have a need to track data integrity and model quality metrics of outcomes so that we can both gauge if the data is healthy coming in and the models run against them are still performing and not giving faulty results. A nice to have would be to graph

Re: Best way to go from RDD to DataFrame of StringType columns

2016-06-17 Thread Everett Anderson
On Fri, Jun 17, 2016 at 1:17 PM, Mich Talebzadeh wrote: > Ok a bit of a challenge. > > Have you tried using databricks stuff?. they can read compressed files and > they might work here? > > val df = >

Running JavaBased Implementationof StreamingKmeans

2016-06-17 Thread Biplob Biswas
Hi, I implemented the streamingKmeans example provided in the spark website but in Java. The full implementation is here, http://pastebin.com/CJQfWNvk But i am not getting anything in the output except occasional timestamps like one below: --- Time:

YARN Application Timeline service with Spark 2.0.0 issue

2016-06-17 Thread Saisai Shao
Hi Community, In Spark 2.0.0 we upgrade to use jersey2 ( https://issues.apache.org/jira/browse/SPARK-12154) instead of jersey 1.9, while for the whole Hadoop we still stick on the old version. This will bring in some issues when yarn timeline service is enabled (

Re: Best way to go from RDD to DataFrame of StringType columns

2016-06-17 Thread Mich Talebzadeh
Ok a bit of a challenge. Have you tried using databricks stuff?. they can read compressed files and they might work here? val df = sqlContext.read.format("com.databricks.spark.csv").option("inferSchema", "true").option("header", "true").load("hdfs://rhes564:9000/data/stg/accounts/nw/10124772")

Re: Kerberos setup in Apache spark connecting to remote HDFS/Yarn

2016-06-17 Thread Sudarshan Rangarajan
Hi Ami, Did you try setting spark.yarn.principal and spark.yarn.keytab as configuration properties, passing in their corresponding Kerberos values ? Search for these properties on http://spark.apache.org/docs/latest/running-on-yarn.html to learn more about what's expected for them. Regards,

Re: Best way to go from RDD to DataFrame of StringType columns

2016-06-17 Thread Everett Anderson
On Fri, Jun 17, 2016 at 12:44 PM, Mich Talebzadeh wrote: > Are these mainly in csv format? > Alas, no -- lots of different formats. Many are fixed width files, where I have outside information to know which byte ranges correspond to which columns. Some have odd null

Re: Best way to go from RDD to DataFrame of StringType columns

2016-06-17 Thread Mich Talebzadeh
Are these mainly in csv format? Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw * http://talebzadehmich.wordpress.com On 17 June 2016 at

Best way to go from RDD to DataFrame of StringType columns

2016-06-17 Thread Everett Anderson
Hi, I have a system with files in a variety of non-standard input formats, though they're generally flat text files. I'd like to dynamically create DataFrames of string columns. What's the best way to go from a RDD to a DataFrame of StringType columns? My current plan is - Call map() on the

Re: Kerberos setup in Apache spark connecting to remote HDFS/Yarn

2016-06-17 Thread akhandeshi
Little more progress... I add few enviornment variables, not I get following error message: InvocationTargetException: Can't get Master Kerberos principal for use as renewer -> [Help 1] -- View this message in context:

Running Java Implementationof StreamingKmeans

2016-06-17 Thread Biplob Biswas
Hi, I implemented the streamingKmeans example provided in the spark website but in Java. The full implementation is here, http://pastebin.com/CJQfWNvk But i am not getting anything in the output except occasional timestamps like one below: --- Time:

Re: Spark UI shows finished when job had an error

2016-06-17 Thread Mich Talebzadeh
Spark GUI runs by default on 4040 and if a job crashes (assuming you meant there was an issue with spark-submit), then the GUI will disconnect. GUI is not there for diagnostics as it reports on statistics. My inclination would be to look at the YARN log files assuming you are using YARN as your

Re: Spark UI shows finished when job had an error

2016-06-17 Thread Gourav Sengupta
Hi, Can you please see the query plan (in case you are using a query)? There is a very high chance that the query was broken into multiple steps and only a subsequent step failed. Regards, Gourav Sengupta On Fri, Jun 17, 2016 at 2:49 PM, Sumona Routh wrote: > Hi there, >

Re: Spark UI shows finished when job had an error

2016-06-17 Thread Jacek Laskowski
Hi, How do you access Cassandra? Could that connector not have sent a SparkListenerEvent to inform about failure? Jacek On 17 Jun 2016 3:50 p.m., "Sumona Routh" wrote: > Hi there, > Our Spark job had an error (specifically the Cassandra table definition > did not match what

Running Java Implementationof StreamingKmeans

2016-06-17 Thread Biplob Biswas
Hi, I implemented the streamingKmeans example provided in the spark website but in Java. The full implementation is here, http://pastebin.com/CJQfWNvk But i am not getting anything in the output except occasional timestamps like one below: --- Time:

Re: What is the interpretation of Cores in Spark doc

2016-06-17 Thread Mich Talebzadeh
great reply everyone. just confining to the current subject matter Spark and the use of CPU allocation. We have Spark-submit parameters: Local mode ${SPARK_HOME}/bin/spark-submit \ --num-executors 1 \ --master local[2] \ ## two cores And that --master[k] on

Spark UI shows finished when job had an error

2016-06-17 Thread Sumona Routh
Hi there, Our Spark job had an error (specifically the Cassandra table definition did not match what was in Cassandra), which threw an exception that logged out to our spark-submit log. However ,the UI never showed any failed stage or job. It appeared as if the job finished without error, which is

Re: Error Running SparkPi.scala Example

2016-06-17 Thread Krishna Kalyan
Hi Jacek, Maven build output *mvn clean install* [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 30:12 min [INFO] Finished at:

Re: spark-xml - xml parsing when rows only have attributes

2016-06-17 Thread VG
Great.. thanks for pointing this out. On Fri, Jun 17, 2016 at 6:21 PM, Ted Yu wrote: > Please see https://github.com/databricks/spark-xml/issues/92 > > On Fri, Jun 17, 2016 at 5:19 AM, VG wrote: > >> I am using spark-xml for loading data and creating

Unable to kill spark app gracefully. Unable to stop driver in cluster mode

2016-06-17 Thread Ravi Agrawal
Hi, While working on Spark 1.6.1, I ran into an issue with closing the Spark app. I tried it with deploy-mode as client as well as cluster: Firstly, deploy-mode : client Ran the app using below command: /usr/local/spark/spark-1.6.1-bin-hadoop2.6/bin/spark-submit

Unable to kill spark app gracefully. Unable to stop driver in cluster mode

2016-06-17 Thread Ravi Agrawal
Hi, While working on Spark 1.6.1, I ran into an issue with closing the Spark app. I tried it with deploy-mode as client as well as cluster: Firstly, deploy-mode : client Ran the app using below command: /usr/local/spark/spark-1.6.1-bin-hadoop2.6/bin/spark-submit --supervise

Re: spark-xml - xml parsing when rows only have attributes

2016-06-17 Thread Ted Yu
Please see https://github.com/databricks/spark-xml/issues/92 On Fri, Jun 17, 2016 at 5:19 AM, VG wrote: > I am using spark-xml for loading data and creating a data frame. > > If xml element has sub elements and values, then it works fine. Example > if the xml element is like

RE: spark job automatically killed without rhyme or reason

2016-06-17 Thread Alexander Kapustin
Hi Zhiliang, Yes, find the exact reason of failure is very difficult. We have issue with similar behavior, due to limited time for investigation, we reduce the number of processed data, and problem has gone. Some points which may help you in investigations: · If you start

spark-xml - xml parsing when rows only have attributes

2016-06-17 Thread VG
I am using spark-xml for loading data and creating a data frame. If xml element has sub elements and values, then it works fine. Example if the xml element is like test however if the xml element is bare with just attributes, then it does not work - Any suggestions. Does not load the

Re: spark job automatically killed without rhyme or reason

2016-06-17 Thread Zhiliang Zhu
Show original message Hi Alexander, is your yarn userlog   just for the executor log ? as for those logs seem a little difficult to exactly decide the wrong point, due to sometimes successful job may also have some kinds of the error  ... but will repair itself.spark seems not that stable

Re: spark job automatically killed without rhyme or reason

2016-06-17 Thread Zhiliang Zhu
Hi Alexander, Thanks a lot for your reply. Yes, submitted by yarn.Do you just mean in the executor log file by way of yarn logs -applicationId id,  in this file, both in some containers' stdout  and stderr : 16/06/17 14:05:40 INFO client.TransportClientFactory: Found inactive connection to

Re: spark job automatically killed without rhyme or reason

2016-06-17 Thread Zhiliang Zhu
Hi Alexander, is your yarn userlog   just for the executor log ? as for those logs seem a little difficult to exactly decide the wrong point, due to sometimes successful job may also have some kinds of the error  ... but will repair itself.spark seems not that stable currently     ... Thank you

Re: What is the interpretation of Cores in Spark doc

2016-06-17 Thread Robin East
Agreed it’s a worthwhile discussion (and interesting IMO) This is a section from your original post: > It is about the terminology or interpretation of that in Spark doc. > > This is my understanding of cores and threads. > > Cores are physical cores. Threads are virtual cores. At least as

Re: java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.xml. Please find packages at http://spark-packages.org

2016-06-17 Thread Siva A
Use Spark XML version,0.3.3 com.databricks spark-xml_2.10 0.3.3 On Fri, Jun 17, 2016 at 4:25 PM, VG wrote: > Hi Siva > > This is what i have for jars. Did you manage to run with these or > different versions ? > > > > org.apache.spark > spark-core_2.10 > 1.6.1 > > >

Re: java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.xml. Please find packages at http://spark-packages.org

2016-06-17 Thread VG
It proceeded with the jars I mentioned. However no data getting loaded into data frame... sob sob :( On Fri, Jun 17, 2016 at 4:25 PM, VG wrote: > Hi Siva > > This is what i have for jars. Did you manage to run with these or > different versions ? > > > > org.apache.spark >

Re: java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.xml. Please find packages at http://spark-packages.org

2016-06-17 Thread VG
Hi Siva This is what i have for jars. Did you manage to run with these or different versions ? org.apache.spark spark-core_2.10 1.6.1 org.apache.spark spark-sql_2.10 1.6.1 com.databricks spark-xml_2.10 0.2.0 org.scala-lang scala-library 2.10.6 Thanks VG On Fri, Jun 17, 2016 at 4:16

Re: spark job automatically killed without rhyme or reason

2016-06-17 Thread Zhiliang Zhu
Hi Alexander, Thanks a lot for your reply. Yes, submitted by yarn.Do you just mean in the executor log file by way of yarn logs -applicationId id,  in this file, both in some containers' stdout  and stderr : 16/06/17 14:05:40 INFO client.TransportClientFactory: Found inactive connection to

Re: java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.xml. Please find packages at http://spark-packages.org

2016-06-17 Thread Siva A
Hi Marco, I did run in IDE(Intellij) as well. It works fine. VG, make sure the right jar is in classpath. --Siva On Fri, Jun 17, 2016 at 4:11 PM, Marco Mistroni wrote: > and your eclipse path is correct? > i suggest, as Siva did before, to build your jar and run it via >

Re: java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.xml. Please find packages at http://spark-packages.org

2016-06-17 Thread Marco Mistroni
and your eclipse path is correct? i suggest, as Siva did before, to build your jar and run it via spark-submit by specifying the --packages option it's as simple as run this command spark-submit --packages com.databricks:spark-xml_: --class Indeed, if you have only these lines to run,

Re: java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.xml. Please find packages at http://spark-packages.org

2016-06-17 Thread Siva A
Try to import the class and see if you are getting compilation error import com.databricks.spark.xml Siva On Fri, Jun 17, 2016 at 4:02 PM, VG wrote: > nopes. eclipse. > > > On Fri, Jun 17, 2016 at 3:58 PM, Siva A wrote: > >> If you are running

Re: java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.xml. Please find packages at http://spark-packages.org

2016-06-17 Thread VG
nopes. eclipse. On Fri, Jun 17, 2016 at 3:58 PM, Siva A wrote: > If you are running from IDE, Are you using Intellij? > > On Fri, Jun 17, 2016 at 3:20 PM, Siva A wrote: > >> Can you try to package as a jar and run using spark-submit >> >>

Re: java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.xml. Please find packages at http://spark-packages.org

2016-06-17 Thread Siva A
If you are running from IDE, Are you using Intellij? On Fri, Jun 17, 2016 at 3:20 PM, Siva A wrote: > Can you try to package as a jar and run using spark-submit > > Siva > > On Fri, Jun 17, 2016 at 3:17 PM, VG wrote: > >> I am trying to run from IDE

Re: java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.xml. Please find packages at http://spark-packages.org

2016-06-17 Thread Siva A
Can you try to package as a jar and run using spark-submit Siva On Fri, Jun 17, 2016 at 3:17 PM, VG wrote: > I am trying to run from IDE and everything else is working fine. > I added spark-xml jar and now I ended up into this dependency > > 6/06/17 15:15:57 INFO

Re: java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.xml. Please find packages at http://spark-packages.org

2016-06-17 Thread VG
I am trying to run from IDE and everything else is working fine. I added spark-xml jar and now I ended up into this dependency 6/06/17 15:15:57 INFO BlockManagerMaster: Registered BlockManager Exception in thread "main" *java.lang.NoClassDefFoundError: scala/collection/GenTraversableOnce$class*

Re: java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.xml. Please find packages at http://spark-packages.org

2016-06-17 Thread VG
Hi Siva, I still get a similar exception (See the highlighted section - It is looking for DataSource) 16/06/17 15:11:37 INFO BlockManagerMaster: Registered BlockManager Exception in thread "main" java.lang.ClassNotFoundException: Failed to find data source: xml. Please find packages at

Re: java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.xml. Please find packages at http://spark-packages.org

2016-06-17 Thread Marco Mistroni
So you are using spark-submit or spark-shell? you will need to launch either by passing --packages option (like in the example below for spark-csv). you will need to iknow --packages com.databricks:spark-xml_: hth On Fri, Jun 17, 2016 at 10:20 AM, VG wrote: > Apologies

Re: java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.xml. Please find packages at http://spark-packages.org

2016-06-17 Thread Siva A
If its not working, Add the package list while executing spark-submit/spark-shell like below $SPARK_HOME/bin/spark-shell --packages com.databricks:spark-xml_2.10:0.3.3 $SPARK_HOME/bin/spark-submit --packages com.databricks:spark-xml_2.10:0.3.3 On Fri, Jun 17, 2016 at 2:56 PM, Siva A

Re: java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.xml. Please find packages at http://spark-packages.org

2016-06-17 Thread Siva A
Just try to use "xml" as format like below, SQLContext sqlContext = new SQLContext(sc); DataFrame df = sqlContext.read() .format("xml") .option("rowTag", "row") .load("A.xml"); FYR: https://github.com/databricks/spark-xml --Siva On Fri, Jun

Re: java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.xml. Please find packages at http://spark-packages.org

2016-06-17 Thread VG
Apologies for that. I am trying to use spark-xml to load data of a xml file. here is the exception 16/06/17 14:49:04 INFO BlockManagerMaster: Registered BlockManager Exception in thread "main" java.lang.ClassNotFoundException: Failed to find data source: org.apache.spark.xml. Please find

Re: java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.xml. Please find packages at http://spark-packages.org

2016-06-17 Thread Marco Mistroni
too little info it'll help if you can post the exception and show your sbt file (if you are using sbt), and provide minimal details on what you are doing kr On Fri, Jun 17, 2016 at 10:08 AM, VG wrote: > Failed to find data source: com.databricks.spark.xml > > Any suggestions

Re: spark sql broadcast join ?

2016-06-17 Thread Takeshi Yamamuro
Hi, Spark sends a smaller table into all the works as broadcast variables, and it joins the table partition-by-partiiton. By default, if table size is under 10MB, the broadcast join works. See: http://spark.apache.org/docs/1.6.1/sql-programming-guide.html#other-configuration-options // maropu

java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.xml. Please find packages at http://spark-packages.org

2016-06-17 Thread VG
Failed to find data source: com.databricks.spark.xml Any suggestions to resolve this

binding two data frame

2016-06-17 Thread pseudo oduesp
Hi, in R we have function named Cbind and rbind for data frame how i can repduce this functions on pyspark df1.col1 df1.col2 df1.col3 df2.col1 df2.col2 df2.col3 fincal result : new data frame df1.col1 df1.col2 df1.col3 df2.col1 df2.col2 df2.col3 thanks

Custom DataFrame filter

2016-06-17 Thread Леонид Поляков
Hi all! Spark 1.6.1. Anyone know how to implement custom DF filter to be later pushed down to custom datasource? To be short, I've managed to create custom Expression, implicitly add methods with it to Column class, but I am stuck at the point where Expression must be converted to Filter by

update data frame inside function

2016-06-17 Thread pseudo oduesp
Hi, how i can update data frame inside function ? why ? i have to apply Stingindexer multiple time because i tried Pipeline but it still extremly slow for 84 columns to Stringindexed eache one have 10 modalities and data frame with 21Milion row i need 15 hours of processing . now i want try

RE: spark job automatically killed without rhyme or reason

2016-06-17 Thread Alexander Kapustin
Hi, Did you submit spark job via YARN? In some cases (memory configuration probably), yarn can kill containers where spark tasks are executed. In this situation, please check yarn userlogs for more information… -- WBR, Alexander From: Zhiliang Zhu Sent: 17

Stringindexers on multiple columns >1000

2016-06-17 Thread pseudo oduesp
Hi, i want aplly string indexers on multiple coluns but when use Stringindexer and pipline that take lang time . Indexer = StringIndexer(inputCol="Feature1", outputCol="indexed1") this it practice for one or two or teen lines but when you have more the 1000 lines how you can do ? thanks

Re: converting timestamp from UTC to many time zones

2016-06-17 Thread Davies Liu
The DataFrame API does not support this use case, you can use still use SQL do that, df.selectExpr("from_utc_timestamp(start, tz) as testthis") On Thu, Jun 16, 2016 at 9:16 AM, ericjhilton wrote: > This is using python with Spark 1.6.1 and dataframes. > > I have

Re: ImportError: No module named numpy

2016-06-17 Thread Bhupendra Mishra
Issue has been fixed after lots of R around finally found preety simple things causing this problem It was related to permission issue on the python libraries. The user I am logged in was not having enough permission to read/execute the following python liabraries.

Re: spark job automatically killed without rhyme or reason

2016-06-17 Thread Zhiliang Zhu
anyone ever met the similar problem, which is quite strange ...  On Friday, June 17, 2016 2:13 PM, Zhiliang Zhu wrote: Hi All, I have a big job which mainly takes more than one hour to run the whole, however, it is very much unreasonable to exit & finish

Re: Limit pyspark.daemon threads

2016-06-17 Thread agateaaa
There is only one executor on each worker. I see one pyspark.daemon, but when the streaming jobs starts a batch I see that it spawns 4 other pyspark.daemon processes. After the batch completes, the 4 pyspark.demon processes die and there is only one left. I think this behavior was introduced by

spark job killed without rhyme or reason

2016-06-17 Thread Zhiliang Zhu
Hi All, I have a big job which mainly takes more than one hour to run the whole, however, it is very much unreasonable to exit & finish to run midway (almost 80% of the job finished actually, but not all), without any apparent error or exception log. I submitted the same job for many times, it