[
https://issues.apache.org/jira/browse/SPARK-22046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16178106#comment-16178106
]
danny mor commented on SPARK-22046:
-----------------------------------
the problem is not with scheduling the tasks through out the cluster but where
the data ends up eventually after shuffle.
> Streaming State cannot be scalable
> ----------------------------------
>
> Key: SPARK-22046
> URL: https://issues.apache.org/jira/browse/SPARK-22046
> Project: Spark
> Issue Type: Improvement
> Components: Structured Streaming
> Affects Versions: 2.2.0
> Environment: OS: amazon linux,
> Streaming Source: kafka 0.10
> vm: aws ec2
> cluster resources: 16Gb per worker, single executor per worker, 8 cores per
> executor
> storage: hdfs
> Reporter: danny mor
> Priority: Minor
>
> State cannot be distributed on the cluster.
> When the {color:#59afe1}StateStoreRDD{color}'s
> {color:#59afe1}getPrefferedLocation {color}is called it
> creates a {color:#59afe1} StateStoreId(checkpointLocation, operatorId,
> partition.index){color},
> send it to the {color:#59afe1}StateStoreCoordinator {color},which holds a
> hashmap of {color:#59afe1}StateStoreId {color}to
> {color:#59afe1}ExecutorCacheTaskLocation{color}, and returns the executorId
> if it is cached.
> the operatorId is generated once every batch in the
> {color:#59afe1}IncrementalExecution {color}instance
> but it is almost always 0 since {color:#59afe1}IncrementalExecution {color}is
> instantiated each batch
> the partition index is limited to the configured value
> {color:#14892c}"spark.sql.shuffle.partitions"{color} (in my case the default
> 200)
> so this limits cache to 200 entries which has no regard to the key itself .
> When introducing new Executors to the cluster and new keys to streaming data,
> it does not effect the distribution of state because the
> {color:#59afe1}StateStoreId {color}does not regard those variables.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]