Re: Is it feasible to build and run Spark on Windows?

2019-12-09 Thread Ping Liu
Super.  Thanks Deepak!

On Mon, Dec 9, 2019 at 6:58 PM Deepak Vohra  wrote:

> Please install Apache Spark on Windows as discussed in Apache Spark on
> Windows - DZone Open Source
> 
>
> Apache Spark on Windows - DZone Open Source
>
> This article explains and provides solutions for some of the most common
> errors developers come across when inst...
> 
>
>
>
> On Monday, December 9, 2019, 11:27:53 p.m. UTC, Ping Liu <
> pingpinga...@gmail.com> wrote:
>
>
> Thanks Deepak!  Yes, I want to try it with Docker.  But my AWS account ran
> out of free period.  Is there a shared EC2 for Spark that we can use for
> free?
>
> Ping
>
>
> On Monday, December 9, 2019, Deepak Vohra  wrote:
> > Haven't tested but the general procedure is to exclude all guava
> dependencies that are not needed. The hadoop-common depedency does not have
> a dependency on guava according to Maven Repository: org.apache.hadoop »
> hadoop-common
> >
> > Maven Repository: org.apache.hadoop » hadoop-common
> >
> > Apache Spark 2.4 has dependency on guava 14.
> > If a Docker image for Cloudera Hadoop is used Spark is may be installed
> on Docker for Windows.
> > For Docker on Windows on EC2 refer Getting Started with Docker for
> Windows - Developer.com
> >
> > Getting Started with Docker for Windows - Developer.com
> >
> > Docker for Windows makes it feasible to run a Docker daemon on Windows
> Server 2016. Learn to harness its power.
> >
> >
> > Conflicting versions is not an issue if Docker is used.
> > "Apache Spark applications usually have a complex set of required
> software dependencies. Spark applications may require specific versions of
> these dependencies (such as Pyspark and R) on the Spark executor hosts,
> sometimes with conflicting versions."
> > Running Spark in Docker Containers on YARN
> >
> > Running Spark in Docker Containers on YARN
> >
> >
> >
> >
> >
> > On Monday, December 9, 2019, 08:37:47 p.m. UTC, Ping Liu <
> pingpinga...@gmail.com> wrote:
> >
> > Hi Deepak,
> > I tried it.  Unfortunately, it still doesn't work.  28.1-jre isn't
> downloaded for somehow.  I'll try something else.  Thank you very much for
> your help!
> > Ping
> >
> > On Fri, Dec 6, 2019 at 5:28 PM Deepak Vohra  wrote:
> >
> >  As multiple guava versions are found exclude guava from all the
> dependecies it could have been downloaded with. And explicitly add a recent
> guava version.
> > 
> > org.apache.hadoop
> > hadoop-common
> >  3.2.1
> > 
> >   
> >  com.google.guava
> >  guava
> >
> > 
> >
> > 
> > com.google.guava
> > guava
> > 28.1-jre
> > 
> >  
> >   
> >
> > On Friday, December 6, 2019, 10:12:55 p.m. UTC, Ping Liu <
> pingpinga...@gmail.com> wrote:
> >
> > Hi Deepak,
> > Following your suggestion, I put exclusion of guava in topmost POM
> (under Spark home directly) as follows.
> > 2227-  
> > 2228-  
> > 2229-org.apache.hadoop
> > 2230:hadoop-common
> > 2231-3.2.1
> > 2232-
> > 2233-  
> > 2234-com.google.guava
> > 2235-guava
> > 2236-  
> > 2237-
> > 2238-  
> > 2239-
> > 2240-  
> > I also set properties for spark.executor.userClassPathFirst=true and
> spark.driver.userClassPathFirst=true
> > D:\apache\spark>mvn -Pyarn -Phadoop-3.2 -Dhadoop-version=3.2.1
> -Dspark.executor.userClassPathFirst=true
> -Dspark.driver.userClassPathFirst=true -DskipTests clean package
> > and rebuilt spark.
> > But I got the same error when running spark-shell.
> >
> > [INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT:
> > [INFO]
> > [INFO] Spark Project Parent POM ... SUCCESS [
> 25.092 s]
> > [INFO] Spark Project Tags . SUCCESS [
> 22.093 s]
> > [INFO] Spark Project Sketch ... SUCCESS [
> 19.546 s]
> > [INFO] Spark Project Local DB . SUCCESS [
> 10.468 s]
> > [INFO] Spark Project Networking ... SUCCESS [
> 17.733 s]
> > [INFO] Spark Project Shuffle Streaming Service  SUCCESS [
>  6.531 s]
> > [INFO] Spark Project Unsafe ... SUCCESS [
> 25.327 s]
> > [INFO] Spark Project Launcher . SUCCESS [
> 27.264 s]
> > [INFO] Spark Project Core . SUCCESS
> [07:59 min]
> > [INFO] Spark Project ML Local Library . SUCCESS
> [01:39 min]
> > [INFO] Spark Project GraphX ... SUCCESS
> [02:08 min]
> > [INFO] Spark Project Streaming  SUCCESS
> [02:56 min]
> > [INFO] Spark Project Catalyst . SUCCESS
> [08:55 min]
> > [INFO] Spark Project SQL .. SUCCESS
> [12:33 min]
> > 

Re: Is it feasible to build and run Spark on Windows?

2019-12-09 Thread Deepak Vohra
 Please install Apache Spark on Windows as discussed in Apache Spark on Windows 
- DZone Open Source

| 
| 
| 
|  |  |

 |

 |
| 
|  | 
Apache Spark on Windows - DZone Open Source

This article explains and provides solutions for some of the most common errors 
developers come across when inst...
 |

 |

 |




On Monday, December 9, 2019, 11:27:53 p.m. UTC, Ping Liu 
 wrote:  
 
 Thanks Deepak!  Yes, I want to try it with Docker.  But my AWS account ran out 
of free period.  Is there a shared EC2 for Spark that we can use for free?

Ping


On Monday, December 9, 2019, Deepak Vohra  wrote:
> Haven't tested but the general procedure is to exclude all guava dependencies 
> that are not needed. The hadoop-common depedency does not have a dependency 
> on guava according to Maven Repository: org.apache.hadoop » hadoop-common
>
> Maven Repository: org.apache.hadoop » hadoop-common
>
> Apache Spark 2.4 has dependency on guava 14. 
> If a Docker image for Cloudera Hadoop is used Spark is may be installed on 
> Docker for Windows.  
> For Docker on Windows on EC2 refer Getting Started with Docker for Windows - 
> Developer.com
>
> Getting Started with Docker for Windows - Developer.com
>
> Docker for Windows makes it feasible to run a Docker daemon on Windows Server 
> 2016. Learn to harness its power.
>
>
> Conflicting versions is not an issue if Docker is used.
> "Apache Spark applications usually have a complex set of required software 
> dependencies. Spark applications may require specific versions of these 
> dependencies (such as Pyspark and R) on the Spark executor hosts, sometimes 
> with conflicting versions."
> Running Spark in Docker Containers on YARN
>
> Running Spark in Docker Containers on YARN
>
>
>
>
>
> On Monday, December 9, 2019, 08:37:47 p.m. UTC, Ping Liu 
>  wrote:
>
> Hi Deepak,
> I tried it.  Unfortunately, it still doesn't work.  28.1-jre isn't downloaded 
> for somehow.  I'll try something else.  Thank you very much for your help!
> Ping
>
> On Fri, Dec 6, 2019 at 5:28 PM Deepak Vohra  wrote:
>
>  As multiple guava versions are found exclude guava from all the dependecies 
> it could have been downloaded with. And explicitly add a recent guava version.
> 
>         org.apache.hadoop
>         hadoop-common
>          3.2.1
>         
>           
>              com.google.guava
>              guava
>            
>         
>        
> 
>     com.google.guava
>     guava
>     28.1-jre
> 
>      
>   
>
> On Friday, December 6, 2019, 10:12:55 p.m. UTC, Ping Liu 
>  wrote:
>
> Hi Deepak,
> Following your suggestion, I put exclusion of guava in topmost POM (under 
> Spark home directly) as follows.
> 2227-      
> 2228-      
> 2229-        org.apache.hadoop
> 2230:        hadoop-common
> 2231-        3.2.1
> 2232-        
> 2233-          
> 2234-            com.google.guava
> 2235-            guava
> 2236-          
> 2237-        
> 2238-      
> 2239-    
> 2240-  
> I also set properties for spark.executor.userClassPathFirst=true and 
> spark.driver.userClassPathFirst=true
> D:\apache\spark>mvn -Pyarn -Phadoop-3.2 -Dhadoop-version=3.2.1 
> -Dspark.executor.userClassPathFirst=true 
> -Dspark.driver.userClassPathFirst=true -DskipTests clean package
> and rebuilt spark.
> But I got the same error when running spark-shell.
>
> [INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT:
> [INFO]
> [INFO] Spark Project Parent POM ... SUCCESS [ 25.092 
> s]
> [INFO] Spark Project Tags . SUCCESS [ 22.093 
> s]
> [INFO] Spark Project Sketch ... SUCCESS [ 19.546 
> s]
> [INFO] Spark Project Local DB . SUCCESS [ 10.468 
> s]
> [INFO] Spark Project Networking ... SUCCESS [ 17.733 
> s]
> [INFO] Spark Project Shuffle Streaming Service  SUCCESS [  6.531 
> s]
> [INFO] Spark Project Unsafe ... SUCCESS [ 25.327 
> s]
> [INFO] Spark Project Launcher . SUCCESS [ 27.264 
> s]
> [INFO] Spark Project Core . SUCCESS [07:59 
> min]
> [INFO] Spark Project ML Local Library . SUCCESS [01:39 
> min]
> [INFO] Spark Project GraphX ... SUCCESS [02:08 
> min]
> [INFO] Spark Project Streaming  SUCCESS [02:56 
> min]
> [INFO] Spark Project Catalyst . SUCCESS [08:55 
> min]
> [INFO] Spark Project SQL .. SUCCESS [12:33 
> min]
> [INFO] Spark Project ML Library ... SUCCESS [08:49 
> min]
> [INFO] Spark Project Tools  SUCCESS [ 16.967 
> s]
> [INFO] Spark Project Hive . SUCCESS [06:15 
> min]
> [INFO] Spark Project Graph API  SUCCESS [ 10.219 
> s]
> [INFO] Spark Project Cypher ... SUCCESS [ 11.952 
> 

Re: Is it feasible to build and run Spark on Windows?

2019-12-09 Thread Ping Liu
Thanks Deepak!  Yes, I want to try it with Docker.  But my AWS account ran
out of free period.  Is there a shared EC2 for Spark that we can use for
free?

Ping


On Monday, December 9, 2019, Deepak Vohra  wrote:
> Haven't tested but the general procedure is to exclude all guava
dependencies that are not needed. The hadoop-common depedency does not have
a dependency on guava according to Maven Repository: org.apache.hadoop »
hadoop-common
>
> Maven Repository: org.apache.hadoop » hadoop-common
>
> Apache Spark 2.4 has dependency on guava 14.
> If a Docker image for Cloudera Hadoop is used Spark is may be installed
on Docker for Windows.
> For Docker on Windows on EC2 refer Getting Started with Docker for
Windows - Developer.com
>
> Getting Started with Docker for Windows - Developer.com
>
> Docker for Windows makes it feasible to run a Docker daemon on Windows
Server 2016. Learn to harness its power.
>
>
> Conflicting versions is not an issue if Docker is used.
> "Apache Spark applications usually have a complex set of required
software dependencies. Spark applications may require specific versions of
these dependencies (such as Pyspark and R) on the Spark executor hosts,
sometimes with conflicting versions."
> Running Spark in Docker Containers on YARN
>
> Running Spark in Docker Containers on YARN
>
>
>
>
>
> On Monday, December 9, 2019, 08:37:47 p.m. UTC, Ping Liu <
pingpinga...@gmail.com> wrote:
>
> Hi Deepak,
> I tried it.  Unfortunately, it still doesn't work.  28.1-jre isn't
downloaded for somehow.  I'll try something else.  Thank you very much for
your help!
> Ping
>
> On Fri, Dec 6, 2019 at 5:28 PM Deepak Vohra  wrote:
>
>  As multiple guava versions are found exclude guava from all the
dependecies it could have been downloaded with. And explicitly add a recent
guava version.
> 
> org.apache.hadoop
> hadoop-common
>  3.2.1
> 
>   
>  com.google.guava
>  guava
>
> 
>
> 
> com.google.guava
> guava
> 28.1-jre
> 
>  
>   
>
> On Friday, December 6, 2019, 10:12:55 p.m. UTC, Ping Liu <
pingpinga...@gmail.com> wrote:
>
> Hi Deepak,
> Following your suggestion, I put exclusion of guava in topmost POM (under
Spark home directly) as follows.
> 2227-  
> 2228-  
> 2229-org.apache.hadoop
> 2230:hadoop-common
> 2231-3.2.1
> 2232-
> 2233-  
> 2234-com.google.guava
> 2235-guava
> 2236-  
> 2237-
> 2238-  
> 2239-
> 2240-  
> I also set properties for spark.executor.userClassPathFirst=true and
spark.driver.userClassPathFirst=true
> D:\apache\spark>mvn -Pyarn -Phadoop-3.2 -Dhadoop-version=3.2.1
-Dspark.executor.userClassPathFirst=true
-Dspark.driver.userClassPathFirst=true -DskipTests clean package
> and rebuilt spark.
> But I got the same error when running spark-shell.
>
> [INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT:
> [INFO]
> [INFO] Spark Project Parent POM ... SUCCESS [
25.092 s]
> [INFO] Spark Project Tags . SUCCESS [
22.093 s]
> [INFO] Spark Project Sketch ... SUCCESS [
19.546 s]
> [INFO] Spark Project Local DB . SUCCESS [
10.468 s]
> [INFO] Spark Project Networking ... SUCCESS [
17.733 s]
> [INFO] Spark Project Shuffle Streaming Service  SUCCESS [
 6.531 s]
> [INFO] Spark Project Unsafe ... SUCCESS [
25.327 s]
> [INFO] Spark Project Launcher . SUCCESS [
27.264 s]
> [INFO] Spark Project Core . SUCCESS
[07:59 min]
> [INFO] Spark Project ML Local Library . SUCCESS
[01:39 min]
> [INFO] Spark Project GraphX ... SUCCESS
[02:08 min]
> [INFO] Spark Project Streaming  SUCCESS
[02:56 min]
> [INFO] Spark Project Catalyst . SUCCESS
[08:55 min]
> [INFO] Spark Project SQL .. SUCCESS
[12:33 min]
> [INFO] Spark Project ML Library ... SUCCESS
[08:49 min]
> [INFO] Spark Project Tools  SUCCESS [
16.967 s]
> [INFO] Spark Project Hive . SUCCESS
[06:15 min]
> [INFO] Spark Project Graph API  SUCCESS [
10.219 s]
> [INFO] Spark Project Cypher ... SUCCESS [
11.952 s]
> [INFO] Spark Project Graph  SUCCESS [
11.171 s]
> [INFO] Spark Project REPL . SUCCESS [
55.029 s]
> [INFO] Spark Project YARN Shuffle Service . SUCCESS
[01:07 min]
> [INFO] Spark Project YARN . SUCCESS
[02:22 min]
> [INFO] Spark Project Assembly . SUCCESS [
21.483 s]
> [INFO] Kafka 0.10+ Token 

Re: Is it feasible to build and run Spark on Windows?

2019-12-09 Thread Deepak Vohra
 Haven't tested but the general procedure is to exclude all guava dependencies 
that are not needed. The hadoop-common depedency does not have a dependency on 
guava according to Maven Repository: org.apache.hadoop » hadoop-common

| 
| 
| 
|  |  |

 |

 |
| 
|  | 
Maven Repository: org.apache.hadoop » hadoop-common


 |

 |

 |



Apache Spark 2.4 has dependency on guava 14. 
If a Docker image for Cloudera Hadoop is used Spark is may be installed on 
Docker for Windows.  
For Docker on Windows on EC2 refer Getting Started with Docker for Windows - 
Developer.com

| 
| 
| 
|  |  |

 |

 |
| 
|  | 
Getting Started with Docker for Windows - Developer.com

Docker for Windows makes it feasible to run a Docker daemon on Windows Server 
2016. Learn to harness its power.
 |

 |

 |




Conflicting versions is not an issue if Docker is used.
"Apache Spark applications usually have a complex set of required software 
dependencies. Spark applications may require specific versions of these 
dependencies (such as Pyspark and R) on the Spark executor hosts, sometimes 
with conflicting versions."Running Spark in Docker Containers on YARN

| 
| 
| 
|  |  |

 |

 |
| 
|  | 
Running Spark in Docker Containers on YARN


 |

 |

 |







On Monday, December 9, 2019, 08:37:47 p.m. UTC, Ping Liu 
 wrote:  
 
 Hi Deepak,
I tried it.  Unfortunately, it still doesn't work.  28.1-jre isn't downloaded 
for somehow.  I'll try something else.  Thank you very much for your help!
Ping

On Fri, Dec 6, 2019 at 5:28 PM Deepak Vohra  wrote:

  As multiple guava versions are found exclude guava from all the dependecies 
it could have been downloaded with. And explicitly add a recent guava version.
        org.apache.hadoop        
hadoop-common         3.2.1        
                       
com.google.guava             guava  
                            
com.google.guava    guava    
28.1-jre       


On Friday, December 6, 2019, 10:12:55 p.m. UTC, Ping Liu 
 wrote:  
 
 Hi Deepak,
Following your suggestion, I put exclusion of guava in topmost POM (under Spark 
home directly) as follows.
2227-      
2228-      
2229-        org.apache.hadoop
2230:        hadoop-common
2231-        3.2.1
2232-        
2233-          
2234-            com.google.guava
2235-            guava
2236-          
2237-        
2238-      
2239-    
2240-  
I also set properties for spark.executor.userClassPathFirst=true and 
spark.driver.userClassPathFirst=true
D:\apache\spark>mvn -Pyarn -Phadoop-3.2 -Dhadoop-version=3.2.1 
-Dspark.executor.userClassPathFirst=true -Dspark.driver.userClassPathFirst=true 
-DskipTests clean package
and rebuilt spark.
But I got the same error when running spark-shell.

[INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT:
[INFO]
[INFO] Spark Project Parent POM ... SUCCESS [ 25.092 s]
[INFO] Spark Project Tags . SUCCESS [ 22.093 s]
[INFO] Spark Project Sketch ... SUCCESS [ 19.546 s]
[INFO] Spark Project Local DB . SUCCESS [ 10.468 s]
[INFO] Spark Project Networking ... SUCCESS [ 17.733 s]
[INFO] Spark Project Shuffle Streaming Service  SUCCESS [  6.531 s]
[INFO] Spark Project Unsafe ... SUCCESS [ 25.327 s]
[INFO] Spark Project Launcher . SUCCESS [ 27.264 s]
[INFO] Spark Project Core . SUCCESS [07:59 min]
[INFO] Spark Project ML Local Library . SUCCESS [01:39 min]
[INFO] Spark Project GraphX ... SUCCESS [02:08 min]
[INFO] Spark Project Streaming  SUCCESS [02:56 min]
[INFO] Spark Project Catalyst . SUCCESS [08:55 min]
[INFO] Spark Project SQL .. SUCCESS [12:33 min]
[INFO] Spark Project ML Library ... SUCCESS [08:49 min]
[INFO] Spark Project Tools  SUCCESS [ 16.967 s]
[INFO] Spark Project Hive . SUCCESS [06:15 min]
[INFO] Spark Project Graph API  SUCCESS [ 10.219 s]
[INFO] Spark Project Cypher ... SUCCESS [ 11.952 s]
[INFO] Spark Project Graph  SUCCESS [ 11.171 s]
[INFO] Spark Project REPL . SUCCESS [ 55.029 s]
[INFO] Spark Project YARN Shuffle Service . SUCCESS [01:07 min]
[INFO] Spark Project YARN . SUCCESS [02:22 min]
[INFO] Spark Project Assembly . SUCCESS [ 21.483 s]
[INFO] Kafka 0.10+ Token Provider for Streaming ... SUCCESS [ 56.450 s]
[INFO] Spark Integration for Kafka 0.10 ... SUCCESS [01:21 min]
[INFO] Kafka 0.10+ Source for Structured Streaming  SUCCESS [02:33 min]
[INFO] Spark Project Examples . SUCCESS 

Re: [VOTE] Shall we release ORC 1.4.5rc1?

2019-12-09 Thread Owen O'Malley
With four +1's and no -1's the vote passes. I'll promote the release.

Thanks,
   Owen



On Fri, Dec 6, 2019 at 6:12 PM Hyukjin Kwon  wrote:

> +1 (as a Spark user)
>
> 2019년 12월 7일 (토) 오전 11:06, Dongjoon Hyun 님이 작성:
>
> > +1 for Apache ORC 1.4.5 release.
> >
> > Thank you for making the release.
> >
> > I'd like to mention some notable changes here.
> > Apache ORC 1.4.5 is not a drop-in replacement for 1.4.4 because of the
> > following.
> >
> >   ORC-498: ReaderImpl and RecordReaderImpl open separate file
> handles.
> >
> > Applications should be updated accordingly. Otherwise, file system
> > leakages occur.
> > For example, Apache Spark 2.3.5-SNAPSHOT is currently using v1.4.4 and
> > will not work with v1.4.5.
> >
> > In short, there is a breaking change between v1.4.4 and v1.4.5 like the
> > breaking change between v1.5.5 and 1.5.6.
> > For the required change, please refer Owen's Apache Spark upgrade patch.
> >
> >   [SPARK-28208][BUILD][SQL] Upgrade to ORC 1.5.6 including closing
> the
> > ORC readers
> >
> >
> https://github.com/apache/spark/commit/dfb0a8bb048d43f8fd1fb05b1027bd2fc7438dbc
> >
> > Bests,
> > Dongjoon.
> >
> >
> > On Fri, Dec 6, 2019 at 4:19 PM Alan Gates  wrote:
> >
> >> +1.  Did a build on ubuntu 16, checked the signatures and hashes.
> >> Reviewed
> >> the license changes.
> >>
> >> Alan.
> >>
> >> On Fri, Dec 6, 2019 at 1:41 PM Owen O'Malley 
> >> wrote:
> >>
> >> > All,
> >> >Ok, I backported a few more fixes in to rc1:
> >> >
> >> >- ORC-480
> >> >- ORC-552
> >> >- ORC-576
> >> >
> >> >
> >> > Should we release the following artifacts as ORC 1.4.5?
> >> >
> >> > tar: http://home.apache.org/~omalley/orc-1.4.5/
> >> > tag: https://github.com/apache/orc/releases/tag/release-1.4.5rc1
> >> > jiras:
> https://issues.apache.org/jira/browse/ORC/fixforversion/12345479
> >> >
> >> > Thanks!
> >> >
> >>
> >
>


Re: Is it feasible to build and run Spark on Windows?

2019-12-09 Thread Ping Liu
Hi Deepak,

I tried it.  Unfortunately, it still doesn't work.  28.1-jre isn't
downloaded for somehow.  I'll try something else.  Thank you very much for
your help!

Ping


On Fri, Dec 6, 2019 at 5:28 PM Deepak Vohra  wrote:

>  As multiple guava versions are found exclude guava from all the
> dependecies it could have been downloaded with. And explicitly add a recent
> guava version.
>
> 
> org.apache.hadoop
> hadoop-common
>  3.2.1
> 
>   
>  com.google.guava
>  guava
>
> 
>
> 
> com.google.guava
> guava
> 28.1-jre
> 
>  
>   
>
>
> On Friday, December 6, 2019, 10:12:55 p.m. UTC, Ping Liu <
> pingpinga...@gmail.com> wrote:
>
>
> Hi Deepak,
>
> Following your suggestion, I put exclusion of guava in topmost POM (under
> Spark home directly) as follows.
>
> 2227-  
> 2228-  
> 2229-org.apache.hadoop
> 2230:hadoop-common
> 2231-3.2.1
> 2232-
> 2233-  
> 2234-com.google.guava
> 2235-guava
> 2236-  
> 2237-
> 2238-  
> 2239-
> 2240-  
>
> I also set properties for spark.executor.userClassPathFirst=true and
> spark.driver.userClassPathFirst=true
>
> D:\apache\spark>mvn -Pyarn -Phadoop-3.2 -Dhadoop-version=3.2.1
> -Dspark.executor.userClassPathFirst=true
> -Dspark.driver.userClassPathFirst=true -DskipTests clean package
>
> and rebuilt spark.
>
> But I got the same error when running spark-shell.
>
> [INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT:
> [INFO]
> [INFO] Spark Project Parent POM ... SUCCESS [
> 25.092 s]
> [INFO] Spark Project Tags . SUCCESS [
> 22.093 s]
> [INFO] Spark Project Sketch ... SUCCESS [
> 19.546 s]
> [INFO] Spark Project Local DB . SUCCESS [
> 10.468 s]
> [INFO] Spark Project Networking ... SUCCESS [
> 17.733 s]
> [INFO] Spark Project Shuffle Streaming Service  SUCCESS [
>  6.531 s]
> [INFO] Spark Project Unsafe ... SUCCESS [
> 25.327 s]
> [INFO] Spark Project Launcher . SUCCESS [
> 27.264 s]
> [INFO] Spark Project Core . SUCCESS [07:59
> min]
> [INFO] Spark Project ML Local Library . SUCCESS [01:39
> min]
> [INFO] Spark Project GraphX ... SUCCESS [02:08
> min]
> [INFO] Spark Project Streaming  SUCCESS [02:56
> min]
> [INFO] Spark Project Catalyst . SUCCESS [08:55
> min]
> [INFO] Spark Project SQL .. SUCCESS [12:33
> min]
> [INFO] Spark Project ML Library ... SUCCESS [08:49
> min]
> [INFO] Spark Project Tools  SUCCESS [
> 16.967 s]
> [INFO] Spark Project Hive . SUCCESS [06:15
> min]
> [INFO] Spark Project Graph API  SUCCESS [
> 10.219 s]
> [INFO] Spark Project Cypher ... SUCCESS [
> 11.952 s]
> [INFO] Spark Project Graph  SUCCESS [
> 11.171 s]
> [INFO] Spark Project REPL . SUCCESS [
> 55.029 s]
> [INFO] Spark Project YARN Shuffle Service . SUCCESS [01:07
> min]
> [INFO] Spark Project YARN . SUCCESS [02:22
> min]
> [INFO] Spark Project Assembly . SUCCESS [
> 21.483 s]
> [INFO] Kafka 0.10+ Token Provider for Streaming ... SUCCESS [
> 56.450 s]
> [INFO] Spark Integration for Kafka 0.10 ... SUCCESS [01:21
> min]
> [INFO] Kafka 0.10+ Source for Structured Streaming  SUCCESS [02:33
> min]
> [INFO] Spark Project Examples . SUCCESS [02:05
> min]
> [INFO] Spark Integration for Kafka 0.10 Assembly .. SUCCESS [
> 30.780 s]
> [INFO] Spark Avro . SUCCESS [01:43
> min]
> [INFO]
> 
> [INFO] BUILD SUCCESS
> [INFO]
> 
> [INFO] Total time:  01:08 h
> [INFO] Finished at: 2019-12-06T11:43:08-08:00
> [INFO]
> 
>
> D:\apache\spark>spark-shell
> 'spark-shell' is not recognized as an internal or external command,
> operable program or batch file.
>
> D:\apache\spark>cd bin
>
> D:\apache\spark\bin>spark-shell
> Exception in thread "main" java.lang.NoSuchMethodError:
> com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
> at
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
> at
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
> at
> 

Re: No of cores per executor.

2019-12-09 Thread Samik Raychaudhuri

Hi,
Take a look at this video: 
[https://www.youtube.com/watch?v=daXEp4HmS-E]. Pretty dense, but might 
answer some of your questions.

Thanks.
-Samik

On 09-Dec-19 4:12 AM, Amit Sharma wrote:
I have set  5 cores per executor. Is there any formula to determine 
best combination of executor and cores and memory per core for better 
performance. Also when I am running local spark instance in my web jar 
getting better speed than running in cluster.




Thanks
Amit




unsubscribe

2019-12-09 Thread Calvin Tran
unsubscribe

On Dec. 9, 2019 6:59 a.m., "Areg Baghdasaryan (BLOOMBERG/ 731 LEX)" 
 wrote:


This e-mail (and any attachments) is intended only for the use of the addressee 
and may contain confidential and privileged information. If you are not the 
intended recipient, any collection, use, disclosure or copying of any part of 
this e-mail is unauthorized. If you have received this communication in error, 
please immediately notify the sender and permanently delete this e-mail and any 
attachments from your e-mail system and destroy any copies you may have 
printed. As secure handling of personal information is paramount to the Alberta 
School Employee Benefit Plan, your cooperation is greatly appreciated.


unsubscribe

2019-12-09 Thread Areg Baghdasaryan (BLOOMBERG/ 731 LEX)
 

[pyspark 2.3+] broadcast timeout

2019-12-09 Thread Rishi Shah
Hi All,

All of a sudden recently we discovered that all of our auto broadcasts have
been timing out, this started happening in our static cloudera cluster as
well as databricks. Data has not changed much. Has anyone seen anything
like this before? Any suggestions other than increasing the timeout period
or shutting off broadcast completely by setting the auto broadcast property
to -1?

-- 
Regards,

Rishi Shah