That's an indication that the beginning offset for a given batch is
higher than the ending offset, i.e. something is seriously wrong.

Are you doing anything at all odd with topics, i.e. deleting and
recreating them, using compacted topics, etc?

Start off with a very basic stream over the same kafka topic that just
does foreach println or similar, with no checkpointing at all, and get
that working first.

On Mon, Feb 20, 2017 at 12:10 PM, Muhammad Haseeb Javed
<11besemja...@seecs.edu.pk> wrote:
> Update: I am using Spark 2.0.2 and  Kafka 0.8.2 with Scala 2.10
>
> On Mon, Feb 20, 2017 at 1:06 PM, Muhammad Haseeb Javed
> <11besemja...@seecs.edu.pk> wrote:
>>
>> I am PhD student at Ohio State working on a study to evaluate streaming
>> frameworks (Spark Streaming, Storm, Flink) using the the Intel HiBench
>> benchmarks. But I think I am having a problem  with Spark. I have Spark
>> Streaming application which I am trying to run on a 5 node cluster
>> (including master). I have 2 zookeeper and 4 kafka brokers. However,
>> whenever I run a Spark Streaming application I encounter the following
>> error:
>>
>> java.lang.IllegalArgumentException: requirement failed: numRecords must
>> not be negative
>>         at scala.Predef$.require(Predef.scala:224)
>>         at
>> org.apache.spark.streaming.scheduler.StreamInputInfo.<init>(InputInfoTracker.scala:38)
>>         at
>> org.apache.spark.streaming.kafka.DirectKafkaInputDStream.compute(DirectKafkaInputDStream.scala:165)
>>         at
>> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:341)
>>         at
>> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:341)
>>         at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
>>         at
>> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:340)
>>         at
>> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:340)
>>         at
>> org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:415)
>>         at
>> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:335)
>>         at
>> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:333)
>>         at scala.Option.orElse(Option.scala:289)
>>
>> The application starts fine, but as soon as the Kafka producers start
>> emitting the stream data I start receiving the aforementioned error
>> repeatedly.
>>
>> I have tried removing Spark Streaming checkpointing files as has been
>> suggested in similar posts on the internet. However, the problem persists
>> even if I start a Kafka topic and its corresponding consumer Spark Streaming
>> application for the first time. Also the problem could not be offset related
>> as I start the topic for the first time.
>>
>> Although the application seems to be processing the stream properly as I
>> can see by the benchmark numbers generated. However, the numbers are way of
>> from what I got for Storm and Flink, suspecting me to believe that there is
>> something wrong with the pipeline and Spark is not able to process the
>> stream as cleanly as it should. Any help in this regard would be really
>> appreciated.
>
>

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to