[jira] [Commented] (SPARK-8474) [STREAMING] Kafka DirectStream API stops receiving messages if collective size of the messages specified in spark.streaming.kafka.maxRatePerPartition exceeds the default fetch size ( fetch.message.max.bytes) of SimpleConsumer

Sean Owen (JIRA) Fri, 19 Jun 2015 01:00:55 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14593159#comment-14593159
 ]


Sean Owen commented on SPARK-8474:
----------------------------------

The max fetch size is per message. It would not affect how many messages you 
can pull. Can you clarify what you mean?

> [STREAMING] Kafka DirectStream API stops receiving messages if collective 
> size of the messages specified in spark.streaming.kafka.maxRatePerPartition 
> exceeds the default fetch size ( fetch.message.max.bytes) of SimpleConsumer
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-8474
>                 URL: https://issues.apache.org/jira/browse/SPARK-8474
>             Project: Spark
>          Issue Type: Bug
>          Components: Streaming
>    Affects Versions: 1.4.0
>            Reporter: Dibyendu Bhattacharya
>            Priority: Critical
>
> The issue is , if in Kafka there are variable size messages ranging from few 
> KB to few hundred KBs , setting the rate limiting by number of messages can 
> leads to potential issue.
> Let say size of messages in Kafka are such that for default 
> fetch.message.max.bytes (which is 1 MB ) limit ONLY 1000 messages can be 
> pulled, whereas I specified the spark.streaming.kafka.maxRatePerPartition 
> number as say 2000. Now with this settings when Kafka RDD pulls messages for 
> its offset range , it will only pull 1000 messages (limited by size of the 
> pull in SimpleConsumer API) and can never be able to pull messages till the 
> desired untilOffset and in KafkaRDD it failed in this assert call..
> assert(requestOffset == part.untilOffset, errRanOutBeforeEnd(part))



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-8474) [STREAMING] Kafka DirectStream API stops receiving messages if collective size of the messages specified in spark.streaming.kafka.maxRatePerPartition exceeds the default fetch size ( fetch.message.max.bytes) of SimpleConsumer

Reply via email to