I did some investigation yesterday and just posted my finds in the ticket. Please read my latest comment in https://issues.apache.org/ jira/browse/SPARK-18057
On Fri, Mar 10, 2017 at 11:41 AM, Cody Koeninger <c...@koeninger.org> wrote: > There are existing tickets on the issues around kafka versions, e.g. > https://issues.apache.org/jira/browse/SPARK-18057 that haven't gotten > any committer weigh-in on direction. > > On Thu, Mar 9, 2017 at 12:52 PM, Oscar Batori <oscarbat...@gmail.com> > wrote: > > Guys, > > > > To change the subject from meta-voting... > > > > We are doing Spark Streaming against a Kafka setup, everything is pretty > > standard, and pretty current. In particular we are using Spark 2.1, and > > Kafka 0.10.1, with batch windows that are quite large (5-10 minutes). The > > problem we are having is pretty well described in the following excerpt > from > > the Spark documentation: > > "For possible kafkaParams, see Kafka consumer config docs. If your Spark > > batch duration is larger than the default Kafka heartbeat session timeout > > (30 seconds), increase heartbeat.interval.ms and session.timeout.ms > > appropriately. For batches larger than 5 minutes, this will require > changing > > group.max.session.timeout.ms on the broker. Note that the example sets > > enable.auto.commit to false, for discussion see Storing Offsets below." > > > > In our case "group.max.session.timeout.ms" is set to default value, and > our > > processing time per batch easily exceeds that value. I did some further > > hunting around and found the following SO post: > > "KIP-62, decouples heartbeats from calls to poll() via a background > > heartbeat thread. This, allow for a longer processing time (ie, time > between > > two consecutive poll()) than heartbeat interval." > > > > This pretty accurately describes our scenario: effectively our per batch > > processing time is 2-6 minutes, well within the batch window, but in > excess > > of the max session timeout between polls, causing the consumer to be > kicked > > out of the group. > > > > Are there any plans to move the Kafka client up to 0.10.1 and make this > > feature available to consumers? Or have I missed some helpful > configuration > > that would ameliorate this problem? I recognize changing > > "group.max.session.timeout.ms" is one solution, though it seems doing > > heartbeat checking outside of implicitly piggy backing on polling seems > more > > elegant. > > > > -Oscar > > > > > > --------------------------------------------------------------------- > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >