Github user victor-wong commented on the pull request:

    https://github.com/apache/storm/pull/1443#issuecomment-221528200
  
    >Can you instead log an Error, ignore the message and proceed further?
    
    This means data loss to users. I am not sure data loss is acceptable or 
not. As I mentioned above, 
[ConsumerIterator](https://github.com/apache/kafka/blob/a81ad2582ee0e533d335fe0dc5c5cc885dbf645d/core/src/main/scala/kafka/consumer/ConsumerIterator.scala)
 choose to throw an exception (MessageTooLargeException, which will cause Kafka 
Consumer to stop working), so I think maybe it is a good way. 
    
    > what do you mean by "topology will fetch no data but still be running"? 
Will it stop fetching data at all?
    
    The spout will keep trying to fetch data, but the response from Kafka 
contains no valid bytes because of size limit. The side effect of this is that 
the data in Kafka topic will pile up while user doesn't know why there storm 
topology stop processing messages (there is no data to process).
    
    
    I think you are right that many users don't want to stall the worker 
because of one large message, but this a result of incorrect config 
(KafkaConfig.fetchSizeBytes) and if they want to avoid this situation, they 
need to set a really large size limit at the first time.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to