reste85 commented on issue #1598: URL: https://github.com/apache/incubator-hudi/issues/1598#issuecomment-628462752
It seems like when it's reading from kafka it gets stuck. We have 113 mln of records in our compacted topic and every run we try to read 50 mln messages. First two runs worked like a charm, the third one is getting stuck (so when it's trying to read a number of messages < 50mln). If i remove the option "spark.network.timeout=500000" from spark-submit conf, i'm getting "java.lang.IllegalArgumentException: requirement failed: Failed to get records for compacted spark-executor-topic_consumer topic-changelog-11 after polling for 310000" I'm trying to follow this post: https://stackoverflow.com/questions/42264669/spark-streaming-assertion-failed-failed-to-get-records-for-spark-executor-a-gro Using these properties in kafka consumer: spark.streaming.kafka.consumer.poll.ms=310000 request.timeout.ms=30000 max.poll.interval.ms=25000 Still getting the same error ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
