[jira] [Resolved] (SPARK-8133) sticky partitions

Sean Owen (JIRA) Fri, 05 Jun 2015 15:41:25 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-8133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sean Owen resolved SPARK-8133.
------------------------------
    Resolution: Invalid

I am not sure this makes sense in the context of Spark Streaming. There is no 
persistent partition to speak of. There is a stream of RDDs which have 
partitions. In general there's no reason to expect events to fall into one 
partition or the other but you can certainly make an RDD from the interval RDDs 
with the partitioning you want. In some special cases the RDD partitioning will 
follow the upstream source's partitioning like with a Kafka direct stream. So, 
I suppose this is for practical purposes already entirely supported.

In any event this is a question for user@, not a JIRA.

> sticky partitions
> -----------------
>
>                 Key: SPARK-8133
>                 URL: https://issues.apache.org/jira/browse/SPARK-8133
>             Project: Spark
>          Issue Type: New Feature
>          Components: Streaming
>    Affects Versions: 1.3.1
>            Reporter: sid
>
> We are trying to replace Apache Storm with Apache Spark streaming.
> In storm; we partitioned stream based on "Customer ID" so that msgs with a 
> range of "customer IDs" will be routed to same bolt (worker).
> We do this because each worker will cache customer details (from DB).
> So we split into 4 partitions and each bolt (worker) will have 1/4 of the 
> entire range.
> I am hoping we have a solution to this in Spark Streaming



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Resolved] (SPARK-8133) sticky partitions

Reply via email to