I did not remember what exact configuration I was using. That link has some good information! Thanks Cody!
On Wed, Nov 16, 2016 at 5:32 PM, Cody Koeninger <c...@koeninger.org> wrote: > Yeah, if you're reporting issues, please be clear as to whether > backpressure is enabled, and whether maxRatePerPartition is set. > > I expect that there is something wrong with backpressure, see e.g. > https://issues.apache.org/jira/browse/SPARK-18371 > > On Wed, Nov 16, 2016 at 5:05 PM, bo yang <bobyan...@gmail.com> wrote: > > I hit similar issue with Spark Streaming. The batch size seemed a little > > random. Sometime it was large with many Kafka messages inside same batch, > > sometimes it was very small with just a few messages. Is it possible that > > was caused by the backpressure implementation in Spark Streaming? > > > > On Wed, Nov 16, 2016 at 4:22 PM, Cody Koeninger <c...@koeninger.org> > wrote: > >> > >> Moved to user list. > >> > >> I'm not really clear on what you're trying to accomplish (why put the > >> csv file through Kafka instead of reading it directly with spark?) > >> > >> auto.offset.reset=largest just means that when starting the job > >> without any defined offsets, it will start at the highest (most > >> recent) available offsets. That's probably not what you want if > >> you've already loaded csv lines into kafka. > >> > >> On Wed, Nov 16, 2016 at 2:45 PM, Hoang Bao Thien <hbthien0...@gmail.com > > > >> wrote: > >> > Hi all, > >> > > >> > I would like to ask a question related to the size of Kafka stream. I > >> > want > >> > to put data (e.g., file *.csv) to Kafka then use Spark streaming to > get > >> > the > >> > output from Kafka and then save to Hive by using SparkSQL. The file > csv > >> > is > >> > about 100MB with ~250K messages/rows (Each row has about 10 fields of > >> > integer). I see that Spark Streaming first received two > >> > partitions/batches, > >> > the first is of 60K messages and the second is of 50K msgs. But from > the > >> > third batch, Spark just received 200 messages for each batch (or > >> > partition). > >> > I think that this problem is coming from Kafka or some configuration > in > >> > Spark. I already tried to configure with the setting > >> > "auto.offset.reset=largest", but every batch only gets 200 messages. > >> > > >> > Could you please tell me how to fix this problem? > >> > Thank you so much. > >> > > >> > Best regards, > >> > Alex > >> > > >> > >> --------------------------------------------------------------------- > >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >> > > >