Re: [Spark Launcher] How to launch parallel jobs?

2017-02-13 Thread Egor Pahomov
on you hadoop UI and verify, that both job get enough resources. 2017-02-13 11:07 GMT-08:00 Egor Pahomov <pahomov.e...@gmail.com>: > "But if i increase only executor-cores the finish time is the same". More > experienced ones can correct me, if I'm wrong, but as far as I

Re: [Spark Launcher] How to launch parallel jobs?

2017-02-13 Thread Egor Pahomov
"But if i increase only executor-cores the finish time is the same". More experienced ones can correct me, if I'm wrong, but as far as I understand that: one partition processed by one spark task. Task is always running on 1 core and not parallelized among cores. So if you have 5 partitions and

Re: Union of DStream and RDD

2017-02-11 Thread Egor Pahomov
gt; On Thu, Feb 9, 2017, 04:58 Egor Pahomov <pahomov.e...@gmail.com> wrote: > >> Just guessing here, but would http://spark.apache.org/ >> docs/latest/streaming-programming-guide.html#basic-sources "*Queue of >> RDDs as a Stream*" work? Basically create DStrea

Re: [Structured Streaming] Using File Sink to store to hive table.

2017-02-11 Thread Egor Pahomov
; wrote: >> >> "Something like that" I've never tried it out myself so I'm only >> guessing having a brief look at the API. >> >> Pozdrawiam, >> Jacek Laskowski >> >> https://medium.com/@jaceklaskowski/ >> Mastering Apache Spark 2.0 https:

Re: [Structured Streaming] Using File Sink to store to hive table.

2017-02-10 Thread Egor Pahomov
at https://twitter.com/jaceklaskowski > > > On Thu, Feb 9, 2017 at 3:55 AM, Egor Pahomov <pahomov.e...@gmail.com> > wrote: > > Jacek, you mean > > http://spark.apache.org/docs/latest/api/scala/index.html# > org.apache.spark.sql.ForeachWriter > > ? I d

Re: [Spark-SQL] Hive support is required to select over the following tables

2017-02-08 Thread Egor Pahomov
Just guessing here, but have you build your spark with "-Phive"? By the way, which version of Zeppelin? 2017-02-08 5:13 GMT-08:00 Daniel Haviv : > Hi, > I'm using Spark 2.1.0 on Zeppelin. > > I can successfully create a table but when I try to select from it I fail: >

Re: Union of DStream and RDD

2017-02-08 Thread Egor Pahomov
Just guessing here, but would http://spark.apache.org/docs/latest/streaming-programming-guide.html#basic-sources "*Queue of RDDs as a Stream*" work? Basically create DStream from your RDD and than union with other DStream. 2017-02-08 12:32 GMT-08:00 Amit Sela : > Hi all, >

Re: [Structured Streaming] Using File Sink to store to hive table.

2017-02-08 Thread Egor Pahomov
Laskowski <ja...@japila.pl>: > Hi, > > Have you considered foreach sink? > > Jacek > > On 6 Feb 2017 8:39 p.m., "Egor Pahomov" <pahomov.e...@gmail.com> wrote: > >> Hi, I'm thinking of using Structured Streaming instead of old streaming, >> but I ne

Re: [Structured Streaming] Using File Sink to store to hive table.

2017-02-06 Thread Egor Pahomov
le all partitioning information in > its own metadata log. Is there a specific reason that you want to store the > information in the Hive Metastore? > > Best, > Burak > > On Mon, Feb 6, 2017 at 11:39 AM, Egor Pahomov <pahomov.e...@gmail.com> > wrote: > >> H

[Structured Streaming] Using File Sink to store to hive table.

2017-02-06 Thread Egor Pahomov
Hi, I'm thinking of using Structured Streaming instead of old streaming, but I need to be able to save results to Hive table. Documentation for file sink says( http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#output-sinks): "Supports writes to partitioned tables. ".

Logs of spark driver in yarn-client mode.

2016-07-06 Thread Egor Pahomov
Hi, I have next issue: I have zeppelin, which set up in yarn-client mode. Notebook in Running state for long period of time with 0% done and I do not see any even accepted application in yarn. To be able to understand what's going on, I need logs of spark driver, which is trying to connect to

Re: Thrift JDBC server - why only one per machine and only yarn-client

2016-07-01 Thread Egor Pahomov
What about yarn-cluster mode? 2016-07-01 11:24 GMT-07:00 Egor Pahomov <pahomov.e...@gmail.com>: > Separate bad users with bad quires from good users with good quires. Spark > do not provide no scope separation out of the box. > > 2016-07-01 11:12 GMT-07:00 Jeff Zhang <zjf.

Re: Thrift JDBC server - why only one per machine and only yarn-client

2016-07-01 Thread Egor Pahomov
n Fri, Jul 1, 2016 at 10:59 AM, Egor Pahomov <pahomov.e...@gmail.com> > wrote: > >> Takeshi, of course I used different HIVE_SERVER2_THRIFT_PORT >> Jeff, thanks, I would try, but from your answer I'm getting the feeling, >> that I'm trying some very rare case?

Re: Thrift JDBC server - why only one per machine and only yarn-client

2016-07-01 Thread Egor Pahomov
thrift server on one > machine, then define different SPARK_CONF_DIR, SPARK_LOG_DIR and > SPARK_PID_DIR for your 2 instances of spark thrift server. Not sure if > there's other conflicts. but please try first. > > > On Fri, Jul 1, 2016 at 10:47 AM, Egor Pahomov <p

Re: Thrift JDBC server - why only one per machine and only yarn-client

2016-07-01 Thread Egor Pahomov
ict issue such as port conflict, pid file, log file and > etc, you can run multiple instances of spark thrift server. > > On Fri, Jul 1, 2016 at 9:32 AM, Egor Pahomov <pahomov.e...@gmail.com> > wrote: > >> Hi, I'm using Spark Thrift JDBC server and 2 limitations are really >

Thrift JDBC server - why only one per machine and only yarn-client

2016-07-01 Thread Egor Pahomov
Hi, I'm using Spark Thrift JDBC server and 2 limitations are really bother me - 1) One instance per machine 2) Yarn client only(not yarn cluster) Are there any architectural reasons for such limitations? About yarn-client I might understand in theory - master is the same process as a server, so

Re: 1.6.0: Standalone application: Getting ClassNotFoundException: org.datanucleus.api.jdo.JDOPersistenceManagerFactory

2016-01-14 Thread Egor Pahomov
really good news, since it's hard to do addJar() properly in Oozie job. 2016-01-12 17:01 GMT-08:00 Egor Pahomov <pahomov.e...@gmail.com>: > Hi, I'm moving my infrastructure from 1.5.2 to 1.6.0 and experiencing > serious issue. I successfully updated spark thrift server from 1.5.2 to &g

1.6.0: Standalone application: Getting ClassNotFoundException: org.datanucleus.api.jdo.JDOPersistenceManagerFactory

2016-01-12 Thread Egor Pahomov
Hi, I'm moving my infrastructure from 1.5.2 to 1.6.0 and experiencing serious issue. I successfully updated spark thrift server from 1.5.2 to 1.6.0. But I have standalone application, which worked fine with 1.5.2 but failing on 1.6.0 with: *NestedThrowables:* *java.lang.ClassNotFoundException:

Elastic allocation(spark.dynamicAllocation.enabled) results in task never being executed.

2014-11-14 Thread Egor Pahomov
Hi. I execute ipython notebook + pyspark with spark.dynamicAllocation.enabled = true. Task never ends. Code: import sys from random import random from operator import add partitions = 10 n = 10 * partitions def f(_): x = random() * 2 - 1 y = random() * 2 - 1 return 1 if x ** 2 +

Re: Elastic allocation(spark.dynamicAllocation.enabled) results in task never being executed.

2014-11-14 Thread Egor Pahomov
YARN, which could be because other jobs are using up all the resources. -Sandy On Fri, Nov 14, 2014 at 11:32 AM, Egor Pahomov pahomov.e...@gmail.com wrote: Hi. I execute ipython notebook + pyspark with spark.dynamicAllocation.enabled = true. Task never ends. Code: import sys from

java.io.FileNotFoundException in usercache

2014-09-25 Thread Egor Pahomov
I work with spark on unstable cluster with bad administration. I started get 14/09/25 15:29:56 ERROR storage.DiskBlockObjectWriter: Uncaught exception while reverting partial writes to file

Re: SPARK 1.1.0 on yarn-cluster and external JARs

2014-09-25 Thread Egor Pahomov
SparkContext.addJar()? Why you didn't like fat jar way? 2014-09-25 16:25 GMT+04:00 rzykov rzy...@gmail.com: We build some SPARK jobs with external jars. I compile jobs by including them in one assembly. But look for an approach to put all external jars into HDFS. We have already put

pyspark + yarn: how everything works.

2014-07-04 Thread Egor Pahomov
Hi, I want to use pySpark with yarn. But documentation doesn't give me full understanding on what's going on, and I simply don't understand code. So: 1) How python shipped to cluster? Should machines in cluster already have python? 2) What happens when I write some python code in map function -

K-means faster on Mahout then on Spark

2014-03-25 Thread Egor Pahomov
Hi, I'm running benchmark, which compares Mahout and SparkML. For now I have next results for k-means: Number of iterations= 10, number of elements = 1000, mahouttime= 602, spark time = 138 Number of iterations= 40, number of elements = 1000, mahouttime= 1917, spark time = 330 Number of

Re: K-means faster on Mahout then on Spark

2014-03-25 Thread Egor Pahomov
modes. Sent from my iPhone On Mar 25, 2014, at 9:25 AM, Prashant Sharma scrapco...@gmail.com wrote: I think Mahout uses FuzzyKmeans, which is different algorithm and it is not iterative. Prashant Sharma On Tue, Mar 25, 2014 at 6:50 PM, Egor Pahomov pahomov.e...@gmail.comwrote: Hi, I'm

[Powered by] Yandex Islands powered by Spark

2014-03-16 Thread Egor Pahomov
Hi, page https://cwiki.apache.org/confluence/display/SPARK/Powered+By+Sparksays I need write here, if want my project to be added there. In Yandex (www.yandex.com) now we using spark for project Yandex Islands (

Re: Error reading HDFS file using spark 0.9.0 / hadoop 2.2.0 - incompatible protobuf 2.5 and 2.4.1

2014-02-28 Thread Egor Pahomov
Spark 0.9 uses protobuf 2.5.0 Hadoop 2.2 uses protobuf 2.5.0 protobuf 2.5.0 can read massages serialized with protobuf 2.4.1 So there is not any reason why you can't read some messages from hadoop 2.2 with protobuf 2.5.0, probably you somehow have 2.4.1 in your class path. Of course it's very bad,

Re: Error reading HDFS file using spark 0.9.0 / hadoop 2.2.0 - incompatible protobuf 2.5 and 2.4.1

2014-02-28 Thread Egor Pahomov
-02-28 23:46 GMT+04:00 Aureliano Buendia buendia...@gmail.com: On Fri, Feb 28, 2014 at 7:17 PM, Egor Pahomov pahomov.e...@gmail.comwrote: Spark 0.9 uses protobuf 2.5.0 Spark 0.9 uses 2.4.1: https://github.com/apache/incubator-spark/blob/4d880304867b55a4f2138617b30600b7fa013b14/pom.xml