Hi,
Reconsidering the execution model behind Streaming would be a good
candidate here, as Spark will not be able to provide the low latency and
sophisticated windowing semantics that more and more use-cases will
require. Maybe relaxing the strict batch model would help a lot. (Mainly
this would hi
I personally build with SBT and run Spark on YARN with IntelliJ. You need
to connect to remote JVMs with a remote debugger. You also need to do
similar, if you use Python, because it will launch a JVM on the driver
aswell.
On Wed, Aug 19, 2015 at 2:10 PM canan chen wrote:
> Thanks Ted. I notice
Hi,
Is there any way to bypass the limitations of SparkSqlSerializer2 in module
SQL? Said that,
1) it does not support complex types,
2) assumes key-value pairs.
Is there any other pluggable serializer that can be used here?
Thanks!
Why is reduce in DStream implemented with a map, reduceByKey and another
map, given that we have an RDD.reduce?
.
>
> In fact, the flow is: allocator.allocateResources() -> sleep ->
> allocator.allocateResources() -> sleep …
>
> But I guess that on the first allocateResources() the allocation is not
> fulfilled. So sleep occurs.
>
>
>
> *From:* Zoltán Zvara [mailto:zoltan.
pache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala
>
> On Thu, Apr 9, 2015 at 1:15 AM, Zoltán Zvara
> wrote:
>
>> Dear Developers,
>>
>> I'm trying to investigate the communication pattern regarding data-flow
I'm trying to debug Spark in yarn-client mode. On my local, single node
cluster everything works fine, but the remote YARN resource manager throws
away my request because of authentication error. I'm running IntelliJ 14 on
Ubuntu and the driver tries to connect to YARN with my local user name. How
Dear Developers,
I'm trying to investigate the communication pattern regarding data-flow
during execution of a Spark program defined by an RDD chain. I'm
investigating from the Task point of view, and found out that the task type
ResultTask (as retrieving the iterator for its RDD for a given parti
Is does not seem to be safe to call RDD.firstParent from anywhere, as it
might throw a java.util.NoSuchElementException: "head of empty list". This
seems to be a bug for a consumer of the RDD API.
Zvara Zoltán
mail, hangout, skype: zoltan.zv...@gmail.com
mobile, viber: +36203129543
bank: 1091
suth 6/a
elte: HSKSJZ (ZVZOAAI.ELTE)
2015-03-25 9:45 GMT+01:00 Zoltán Zvara :
> Hi!
>
> I'm using the latest IntelliJ and I can't compile the yarn project into
> the Spark assembly fat JAR. That is why I'm getting a SparkException with
> message "Unable to load
Hi!
I'm using the latest IntelliJ and I can't compile the yarn project into the
Spark assembly fat JAR. That is why I'm getting a SparkException with
message "Unable to load YARN support". The yarn project is also missing
from SBT tasks and I can't add it. How can I force sbt to include?
Thanks!
s: Hungary, 2475 Kápolnásnyék, Kossuth 6/a
elte: HSKSJZ (ZVZOAAI.ELTE)
2015-03-24 16:42 GMT+01:00 Sandy Ryza :
> That's correct. What's the reason this information is needed?
>
> -Sandy
>
> On Tue, Mar 24, 2015 at 11:41 AM, Zoltán Zvara
> wrote:
>
>> Thank
for the
> amount that YARN has rounded up if those configuration properties
> (yarn.scheduler.minimum-allocation-mb and
> yarn.scheduler.increment-allocation-mb) are not present on the node.
>
> -Sandy
>
> -Sandy
>
> On Mon, Mar 23, 2015 at 5:08 PM, Zoltán Zvara
> wrote:
ore lines into storage instead of in the memory. Could Spark
> streaming work like this way? Dose Flink work like this?
>
>
>
>
>
> On Tue, Mar 24, 2015 at 7:04 PM Zoltán Zvara
> wrote:
>
>> There is a BlockGenerator on each worker node next to the
>> ReceiverSuper
There is a BlockGenerator on each worker node next to the
ReceiverSupervisorImpl, which generates Blocks out of an ArrayBuffer in
each interval (block_interval). These Blocks are passed to
ReceiverSupervisorImpl, which throws these blocks to into the BlockManager
for storage. BlockInfos are passed
Let's say I'm an Executor instance in a Spark system. Who started me and
where, when I run on a worker node supervised by (a) Mesos, (b) YARN? I
suppose I'm the only one Executor on a worker node for a given framework
scheduler (driver). If I'm an Executor instance, who is the closest object
to me
I'm trying to understand the task scheduling mechanism of Spark, and I'm
curious about where does locality preferences get evaluated? I'm trying to
determine if locality preferences are fetchable before the task get
serialized. A HintSet would be most appreciated!
Have nice day!
Zvara Zoltán
m
I'm trying to understand the block allocation mechanism Spark uses to
generate batch jobs and a JobSet.
The JobGenerator.generateJobs tries to allocate received blocks to batch,
effectively in ReceivedBlockTracker.allocateBlocksToBatch creates
a streamIdToBlocks, where steam ID's (Int) mapped to S
18 matches
Mail list logo