on you hadoop UI and verify, that both job get enough resources.
2017-02-13 11:07 GMT-08:00 Egor Pahomov <pahomov.e...@gmail.com>:
> "But if i increase only executor-cores the finish time is the same". More
> experienced ones can correct me, if I'm wrong, but as far as I
"But if i increase only executor-cores the finish time is the same". More
experienced ones can correct me, if I'm wrong, but as far as I understand
that: one partition processed by one spark task. Task is always running on
1 core and not parallelized among cores. So if you have 5 partitions and
gt; On Thu, Feb 9, 2017, 04:58 Egor Pahomov <pahomov.e...@gmail.com> wrote:
>
>> Just guessing here, but would http://spark.apache.org/
>> docs/latest/streaming-programming-guide.html#basic-sources "*Queue of
>> RDDs as a Stream*" work? Basically create DStrea
; wrote:
>>
>> "Something like that" I've never tried it out myself so I'm only
>> guessing having a brief look at the API.
>>
>> Pozdrawiam,
>> Jacek Laskowski
>>
>> https://medium.com/@jaceklaskowski/
>> Mastering Apache Spark 2.0 https:
at https://twitter.com/jaceklaskowski
>
>
> On Thu, Feb 9, 2017 at 3:55 AM, Egor Pahomov <pahomov.e...@gmail.com>
> wrote:
> > Jacek, you mean
> > http://spark.apache.org/docs/latest/api/scala/index.html#
> org.apache.spark.sql.ForeachWriter
> > ? I d
Just guessing here, but have you build your spark with "-Phive"? By the
way, which version of Zeppelin?
2017-02-08 5:13 GMT-08:00 Daniel Haviv :
> Hi,
> I'm using Spark 2.1.0 on Zeppelin.
>
> I can successfully create a table but when I try to select from it I fail:
>
Just guessing here, but would
http://spark.apache.org/docs/latest/streaming-programming-guide.html#basic-sources
"*Queue of RDDs as a Stream*" work? Basically create DStream from your RDD
and than union with other DStream.
2017-02-08 12:32 GMT-08:00 Amit Sela :
> Hi all,
>
Laskowski <ja...@japila.pl>:
> Hi,
>
> Have you considered foreach sink?
>
> Jacek
>
> On 6 Feb 2017 8:39 p.m., "Egor Pahomov" <pahomov.e...@gmail.com> wrote:
>
>> Hi, I'm thinking of using Structured Streaming instead of old streaming,
>> but I ne
le all partitioning information in
> its own metadata log. Is there a specific reason that you want to store the
> information in the Hive Metastore?
>
> Best,
> Burak
>
> On Mon, Feb 6, 2017 at 11:39 AM, Egor Pahomov <pahomov.e...@gmail.com>
> wrote:
>
>> H
Hi, I'm thinking of using Structured Streaming instead of old streaming,
but I need to be able to save results to Hive table. Documentation for file
sink says(
http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#output-sinks):
"Supports writes to partitioned tables. ".
Hi, I have next issue:
I have zeppelin, which set up in yarn-client mode. Notebook in Running
state for long period of time with 0% done and I do not see any even
accepted application in yarn.
To be able to understand what's going on, I need logs of spark driver,
which is trying to connect to
What about yarn-cluster mode?
2016-07-01 11:24 GMT-07:00 Egor Pahomov <pahomov.e...@gmail.com>:
> Separate bad users with bad quires from good users with good quires. Spark
> do not provide no scope separation out of the box.
>
> 2016-07-01 11:12 GMT-07:00 Jeff Zhang <zjf.
n Fri, Jul 1, 2016 at 10:59 AM, Egor Pahomov <pahomov.e...@gmail.com>
> wrote:
>
>> Takeshi, of course I used different HIVE_SERVER2_THRIFT_PORT
>> Jeff, thanks, I would try, but from your answer I'm getting the feeling,
>> that I'm trying some very rare case?
thrift server on one
> machine, then define different SPARK_CONF_DIR, SPARK_LOG_DIR and
> SPARK_PID_DIR for your 2 instances of spark thrift server. Not sure if
> there's other conflicts. but please try first.
>
>
> On Fri, Jul 1, 2016 at 10:47 AM, Egor Pahomov <p
ict issue such as port conflict, pid file, log file and
> etc, you can run multiple instances of spark thrift server.
>
> On Fri, Jul 1, 2016 at 9:32 AM, Egor Pahomov <pahomov.e...@gmail.com>
> wrote:
>
>> Hi, I'm using Spark Thrift JDBC server and 2 limitations are really
>
Hi, I'm using Spark Thrift JDBC server and 2 limitations are really bother
me -
1) One instance per machine
2) Yarn client only(not yarn cluster)
Are there any architectural reasons for such limitations? About yarn-client
I might understand in theory - master is the same process as a server, so
really good news, since it's hard to do
addJar() properly in Oozie job.
2016-01-12 17:01 GMT-08:00 Egor Pahomov <pahomov.e...@gmail.com>:
> Hi, I'm moving my infrastructure from 1.5.2 to 1.6.0 and experiencing
> serious issue. I successfully updated spark thrift server from 1.5.2 to
&g
Hi, I'm moving my infrastructure from 1.5.2 to 1.6.0 and experiencing
serious issue. I successfully updated spark thrift server from 1.5.2 to
1.6.0. But I have standalone application, which worked fine with 1.5.2 but
failing on 1.6.0 with:
*NestedThrowables:*
*java.lang.ClassNotFoundException:
Hi.
I execute ipython notebook + pyspark with spark.dynamicAllocation.enabled =
true. Task never ends.
Code:
import sys
from random import random
from operator import add
partitions = 10
n = 10 * partitions
def f(_):
x = random() * 2 - 1
y = random() * 2 - 1
return 1 if x ** 2 +
YARN, which could be because
other jobs are using up all the resources.
-Sandy
On Fri, Nov 14, 2014 at 11:32 AM, Egor Pahomov pahomov.e...@gmail.com
wrote:
Hi.
I execute ipython notebook + pyspark with spark.dynamicAllocation.enabled
= true. Task never ends.
Code:
import sys
from
I work with spark on unstable cluster with bad administration.
I started get
14/09/25 15:29:56 ERROR storage.DiskBlockObjectWriter: Uncaught
exception while reverting partial writes to file
SparkContext.addJar()?
Why you didn't like fat jar way?
2014-09-25 16:25 GMT+04:00 rzykov rzy...@gmail.com:
We build some SPARK jobs with external jars. I compile jobs by including
them
in one assembly.
But look for an approach to put all external jars into HDFS.
We have already put
Hi, I want to use pySpark with yarn. But documentation doesn't give me full
understanding on what's going on, and I simply don't understand code. So:
1) How python shipped to cluster? Should machines in cluster already have
python?
2) What happens when I write some python code in map function -
Hi, I'm running benchmark, which compares Mahout and SparkML. For now I
have next results for k-means:
Number of iterations= 10, number of elements = 1000, mahouttime= 602,
spark time = 138
Number of iterations= 40, number of elements = 1000, mahouttime= 1917,
spark time = 330
Number of
modes.
Sent from my iPhone
On Mar 25, 2014, at 9:25 AM, Prashant Sharma scrapco...@gmail.com wrote:
I think Mahout uses FuzzyKmeans, which is different algorithm and it is
not iterative.
Prashant Sharma
On Tue, Mar 25, 2014 at 6:50 PM, Egor Pahomov pahomov.e...@gmail.comwrote:
Hi, I'm
Hi, page https://cwiki.apache.org/confluence/display/SPARK/Powered+By+Sparksays
I need write here, if want my project to be added there.
In Yandex (www.yandex.com) now we using spark for project Yandex Islands (
Spark 0.9 uses protobuf 2.5.0
Hadoop 2.2 uses protobuf 2.5.0
protobuf 2.5.0 can read massages serialized with protobuf 2.4.1
So there is not any reason why you can't read some messages from hadoop 2.2
with protobuf 2.5.0, probably you somehow have 2.4.1 in your class path. Of
course it's very bad,
-02-28 23:46 GMT+04:00 Aureliano Buendia buendia...@gmail.com:
On Fri, Feb 28, 2014 at 7:17 PM, Egor Pahomov pahomov.e...@gmail.comwrote:
Spark 0.9 uses protobuf 2.5.0
Spark 0.9 uses 2.4.1:
https://github.com/apache/incubator-spark/blob/4d880304867b55a4f2138617b30600b7fa013b14/pom.xml
28 matches
Mail list logo