The same executor can be used for both receiving and processing,
irrespective of the deployment mode (yarn, spark standalone, etc.) It boils
down to the number of cores / task slots that executor has. Each receiver
is like a long running task, so each of them occupy a slot. If there are
free slots in the executor then other tasks can be run on them.

So if you are finding that the other tasks are being run, check how many
cores/task slots the executor has and whether there are more task slots
than the number of input dstream / receivers you are launching.

@Praveen  your answers were pretty much spot on, thanks for chipping in!




On Fri, Jul 11, 2014 at 11:16 AM, Yan Fang <yanfang...@gmail.com> wrote:

> Hi Praveen,
>
> Thank you for the answer. That's interesting because if I only bring up
> one executor for the Spark Streaming, it seems only the receiver is
> working, no other tasks are happening, by checking the log and UI. Maybe
> it's just because the receiving task eats all the resource?, not because
> one executor can only run one receiver?
>
> Fang, Yan
> yanfang...@gmail.com
> +1 (206) 849-4108
>
>
> On Fri, Jul 11, 2014 at 6:06 AM, Praveen Seluka <psel...@qubole.com>
> wrote:
>
>> Here are my answers. But am just getting started with Spark Streaming -
>> so please correct me if am wrong.
>> 1) Yes
>> 2) Receivers will run on executors. Its actually a job thats submitted
>> where # of tasks equals # of receivers. An executor can actually run more
>> than one task at the same time. Hence you could have more number of
>> receivers than executors but its not recommended I think.
>> 3) As said in 2, the executor where receiver task is running can be used
>> for map/reduce tasks. In yarn-cluster mode, the driver program is actually
>> run as application master (lives in the first container thats launched) and
>> this is not an executor - hence its not used for other operations.
>> 4) the driver runs in a separate container. I think the same executor can
>> be used for receiver and the processing task also (this part am not very
>> sure)
>>
>>
>>  On Fri, Jul 11, 2014 at 12:29 AM, Yan Fang <yanfang...@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> I am working to improve the parallelism of the Spark Streaming
>>> application. But I have problem in understanding how the executors are used
>>> and the application is distributed.
>>>
>>> 1. In YARN, is one executor equal one container?
>>>
>>> 2. I saw the statement that a streaming receiver runs on one work
>>> machine (*"n**ote that each input DStream creates a single receiver
>>> (running on a worker machine) that receives a single stream of data"*).
>>> Does the "work machine" mean the executor or physical machine? If I have
>>> more receivers than the executors, will it still work?
>>>
>>> 3. Is the executor that holds receiver also used for other operations,
>>> such as map, reduce, or fully occupied by the receiver? Similarly, if I run
>>> in yarn-cluster mode, is the executor running driver program used by other
>>> operations too?
>>>
>>> 4. So if I have a driver program (cluster mode) and streaming receiver,
>>> do I have to have at least 2 executors because the program and streaming
>>> receiver have to be on different executors?
>>>
>>> Thank you. Sorry for having so many questions but I do want to
>>> understand how the Spark Streaming distributes in order to assign
>>> reasonable recourse.*_* Thank you again.
>>>
>>> Best,
>>>
>>> Fang, Yan
>>> yanfang...@gmail.com
>>> +1 (206) 849-4108
>>>
>>
>>
>

Reply via email to