Re: Kafka segmentation

bo yang Wed, 16 Nov 2016 21:51:06 -0800

I did not remember what exact configuration I was using. That link has some
good information! Thanks Cody!



On Wed, Nov 16, 2016 at 5:32 PM, Cody Koeninger <c...@koeninger.org> wrote:

> Yeah, if you're reporting issues, please be clear as to whether
> backpressure is enabled, and whether maxRatePerPartition is set.
>
> I expect that there is something wrong with backpressure, see e.g.
> https://issues.apache.org/jira/browse/SPARK-18371
>
> On Wed, Nov 16, 2016 at 5:05 PM, bo yang <bobyan...@gmail.com> wrote:
> > I hit similar issue with Spark Streaming. The batch size seemed a little
> > random. Sometime it was large with many Kafka messages inside same batch,
> > sometimes it was very small with just a few messages. Is it possible that
> > was caused by the backpressure implementation in Spark Streaming?
> >
> > On Wed, Nov 16, 2016 at 4:22 PM, Cody Koeninger <c...@koeninger.org>
> wrote:
> >>
> >> Moved to user list.
> >>
> >> I'm not really clear on what you're trying to accomplish (why put the
> >> csv file through Kafka instead of reading it directly with spark?)
> >>
> >> auto.offset.reset=largest just means that when starting the job
> >> without any defined offsets, it will start at the highest (most
> >> recent) available offsets.  That's probably not what you want if
> >> you've already loaded csv lines into kafka.
> >>
> >> On Wed, Nov 16, 2016 at 2:45 PM, Hoang Bao Thien <hbthien0...@gmail.com
> >
> >> wrote:
> >> > Hi all,
> >> >
> >> > I would like to ask a question related to the size of Kafka stream. I
> >> > want
> >> > to put data (e.g., file *.csv) to Kafka then use Spark streaming to
> get
> >> > the
> >> > output from Kafka and then save to Hive by using SparkSQL. The file
> csv
> >> > is
> >> > about 100MB with ~250K messages/rows (Each row has about 10 fields of
> >> > integer). I see that Spark Streaming first received two
> >> > partitions/batches,
> >> > the first is of 60K messages and the second is of 50K msgs. But from
> the
> >> > third batch, Spark just received 200 messages for each batch (or
> >> > partition).
> >> > I think that this problem is coming from Kafka or some configuration
> in
> >> > Spark. I already tried to configure with the setting
> >> > "auto.offset.reset=largest", but every batch only gets 200 messages.
> >> >
> >> > Could you please tell me how to fix this problem?
> >> > Thank you so much.
> >> >
> >> > Best regards,
> >> > Alex
> >> >
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> >>
> >
>

Re: Kafka segmentation

Reply via email to