When you say "launch long-running tasks" does it mean long running Spark
jobs/tasks, or long-running tasks in another system?

If the rate of requests from Kafka is not low (in terms of records per
second), you could collect the records in the driver, and maintain the
"shared bag" in the driver. A separate thread in the driver could pick
stuff from the bag and launch "tasks". This is a slightly unorthodox use of
Spark Streaming, but should work.

If the rate of request from Kafka is high, then I am not sure how you can
sustain that many long running tasks (assuming 1 task corresponding to each
request from Kafka).

TD


On Wed, Mar 26, 2014 at 1:19 AM, Bryan Bryan <bryanbryan...@gmail.com>wrote:

> Hi there,
>
> I have read about the two fundamental shared features in spark
> (broadcasting variables and accumulators), but this is what i need.
>
> I'm using spark streaming in order to get requests from Kafka, these
> requests may launch long-running tasks, and i need to control them:
>
> 1) Keep them in a shared bag, like a Hashmap, to retrieve them by ID, for
> example.
> 2) Retrieve an instance of this object/task whatever on-demand
> (on-request, in fact)
>
>
> Any idea about that? How can i share objects between slaves? May i use
> something out of spark (maybe hazelcast')
>
>
> Regards.
>

Reply via email to