[
https://issues.apache.org/jira/browse/SPARK-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14593159#comment-14593159
]
Sean Owen commented on SPARK-8474:
----------------------------------
The max fetch size is per message. It would not affect how many messages you
can pull. Can you clarify what you mean?
> [STREAMING] Kafka DirectStream API stops receiving messages if collective
> size of the messages specified in spark.streaming.kafka.maxRatePerPartition
> exceeds the default fetch size ( fetch.message.max.bytes) of SimpleConsumer
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: SPARK-8474
> URL: https://issues.apache.org/jira/browse/SPARK-8474
> Project: Spark
> Issue Type: Bug
> Components: Streaming
> Affects Versions: 1.4.0
> Reporter: Dibyendu Bhattacharya
> Priority: Critical
>
> The issue is , if in Kafka there are variable size messages ranging from few
> KB to few hundred KBs , setting the rate limiting by number of messages can
> leads to potential issue.
> Let say size of messages in Kafka are such that for default
> fetch.message.max.bytes (which is 1 MB ) limit ONLY 1000 messages can be
> pulled, whereas I specified the spark.streaming.kafka.maxRatePerPartition
> number as say 2000. Now with this settings when Kafka RDD pulls messages for
> its offset range , it will only pull 1000 messages (limited by size of the
> pull in SimpleConsumer API) and can never be able to pull messages till the
> desired untilOffset and in KafkaRDD it failed in this assert call..
> assert(requestOffset == part.untilOffset, errRanOutBeforeEnd(part))
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]