Re: Spark Streaming Kafka - DirectKafkaInputDStream: Using the new Kafka Consumer API

2015-12-04 Thread Mario Ds Briggs

>>
 forcing people on kafka 8.x to upgrade their brokers is questionable.
<<

I agree and i was more thinking maybe there is a way to support both for a
period of time (of course means some more code to maintain :-)).


thanks
Mario



From:   Cody Koeninger 
To: Mario Ds Briggs/India/IBM@IBMIN
Cc: "dev@spark.apache.org" 
Date:   04/12/2015 12:15 am
Subject:Re: Spark Streaming Kafka - DirectKafkaInputDStream: Using the
new Kafka Consumer API



Honestly my feeling on any new API is to wait for a point release before
taking it seriously :)

Auth and encryption seem like the only compelling reason to move, but
forcing people on kafka 8.x to upgrade their brokers is questionable.

On Thu, Dec 3, 2015 at 11:30 AM, Mario Ds Briggs 
wrote:
  Hi,

  Wanted to pick Cody's mind on what he thinks about
  DirectKafkaInputDStream/KafkaRDD internally using the new Kafka consumer
  API. I know the latter is documented as beta-quality, but yet wanted to
  know if he sees any blockers as to why shouldn't go there shortly. On my
  side the consideration is that kafka 0.9.0.0 introduced Authentication
  and Encryption (beta again) between clients & brokers, but this is
  available only newer Consumer API's and not in the older
  Low-level/High-level API's.

  From briefly studying the implementation of
  DirectKafkaInputDStream/KafkaRDD and new Consumer API, my thinking is
  that it is possible to support the exact current implementation you have
  using the new API's.
  One area that isnt so straightforward was the ctor of KafkaRDD fixes the
  offsetRange (I did read about the deterministic feature you were after)
  and i couldnt find a direct method in the new Consumer API to get the
  current 'latest' offset - however one can do a consumer.seekToEnd() and
  then call a consumer.position().
  Of course one other benefit is that the new Consumer API's abstracts away
  having to deal with finding the leader for a partition, so can get rid of
  that code

  Would be great to get your thoughts.

  thanks in advance
  Mario










Re: Quick question regarding Maven and Spark Assembly jar

2015-12-04 Thread Sean Owen
I think one problem is that the assembly by nature includes a bunch of
particular versions of dependencies. Only one can be published, but,
it would be unlikely to be the right flavor of assembly for any given
user.

On Fri, Dec 4, 2015 at 1:27 AM, Matt Cheah  wrote:
> Hi everyone,
>
> A very brief question out of curiosity – is there any particular reason why
> we don’t publish the Spark assembly jar on the Maven repository?
>
> Thanks,
>
> -Matt Cheah

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: A proposal for Spark 2.0

2015-12-04 Thread Sean Owen
To be clear-er, I don't think it's clear yet whether a 1.7 release
should exist or not. I could see both making sense. It's also not
really necessary to decide now, well before a 1.6 is even out in the
field. Deleting the version lost information, and I would not have
done that given my reply. Reynold maybe I can take this up with you
offline.

On Thu, Dec 3, 2015 at 6:03 PM, Mark Hamstra  wrote:
> Reynold's post fromNov. 25:
>
>> I don't think we should drop support for Scala 2.10, or make it harder in
>> terms of operations for people to upgrade.
>>
>> If there are further objections, I'm going to bump remove the 1.7 version
>> and retarget things to 2.0 on JIRA.
>
>
> On Thu, Dec 3, 2015 at 12:47 AM, Sean Owen  wrote:
>>
>> Reynold, did you (or someone else) delete version 1.7.0 in JIRA? I
>> think that's premature. If there's a 1.7.0 then we've lost info about
>> what it would contain. It's trivial at any later point to merge the
>> versions. And, since things change and there's not a pressing need to
>> decide one way or the other, it seems fine to at least collect this
>> info like we have things like "1.4.3" that may never be released. I'd
>> like to add it back?
>>
>> On Thu, Nov 26, 2015 at 9:45 AM, Sean Owen  wrote:
>> > Maintaining both a 1.7 and 2.0 is too much work for the project, which
>> > is over-stretched now. This means that after 1.6 it's just small
>> > maintenance releases in 1.x and no substantial features or evolution.
>> > This means that the "in progress" APIs in 1.x that will stay that way,
>> > unless one updates to 2.x. It's not unreasonable, but means the update
>> > to the 2.x line isn't going to be that optional for users.
>> >
>> > Scala 2.10 is already EOL right? Supporting it in 2.x means supporting
>> > it for a couple years, note. 2.10 is still used today, but that's the
>> > point of the current stable 1.x release in general: if you want to
>> > stick to current dependencies, stick to the current release. Although
>> > I think that's the right way to think about support across major
>> > versions in general, I can see that 2.x is more of a required update
>> > for those following the project's fixes and releases. Hence may indeed
>> > be important to just keep supporting 2.10.
>> >
>> > I can't see supporting 2.12 at the same time (right?). Is that a
>> > concern? it will be long since GA by the time 2.x is first released.
>> >
>> > There's another fairly coherent worldview where development continues
>> > in 1.7 and focuses on finishing the loose ends and lots of bug fixing.
>> > 2.0 is delayed somewhat into next year, and by that time supporting
>> > 2.11+2.12 and Java 8 looks more feasible and more in tune with
>> > currently deployed versions.
>> >
>> > I can't say I have a strong view but I personally hadn't imagined 2.x
>> > would start now.
>> >
>> >
>> > On Thu, Nov 26, 2015 at 7:00 AM, Reynold Xin 
>> > wrote:
>> >> I don't think we should drop support for Scala 2.10, or make it harder
>> >> in
>> >> terms of operations for people to upgrade.
>> >>
>> >> If there are further objections, I'm going to bump remove the 1.7
>> >> version
>> >> and retarget things to 2.0 on JIRA.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Spark Streaming Kafka - DirectKafkaInputDStream: Using the new Kafka Consumer API

2015-12-04 Thread Cody Koeninger
Brute force way to do it might be to just have a separate
streaming-kafka-new-consumer subproject, or something along those lines.

On Fri, Dec 4, 2015 at 3:12 AM, Mario Ds Briggs 
wrote:

> >>
> forcing people on kafka 8.x to upgrade their brokers is questionable.
> <<
>
> I agree and i was more thinking maybe there is a way to support both for a
> period of time (of course means some more code to maintain :-)).
>
>
> thanks
> Mario
>
> [image: Inactive hide details for Cody Koeninger ---04/12/2015 12:15:55
> am---Honestly my feeling on any new API is to wait for a point]Cody
> Koeninger ---04/12/2015 12:15:55 am---Honestly my feeling on any new API is
> to wait for a point release before taking it seriously :)
>
> From: Cody Koeninger 
> To: Mario Ds Briggs/India/IBM@IBMIN
> Cc: "dev@spark.apache.org" 
> Date: 04/12/2015 12:15 am
> Subject: Re: Spark Streaming Kafka - DirectKafkaInputDStream: Using the
> new Kafka Consumer API
> --
>
>
>
> Honestly my feeling on any new API is to wait for a point release before
> taking it seriously :)
>
> Auth and encryption seem like the only compelling reason to move, but
> forcing people on kafka 8.x to upgrade their brokers is questionable.
>
> On Thu, Dec 3, 2015 at 11:30 AM, Mario Ds Briggs <
> *mario.bri...@in.ibm.com* > wrote:
>
>Hi,
>
>Wanted to pick Cody's mind on what he thinks about
>DirectKafkaInputDStream/KafkaRDD internally using the new Kafka consumer
>API. I know the latter is documented as beta-quality, but yet wanted to
>know if he sees any blockers as to why shouldn't go there shortly. On my
>side the consideration is that kafka 0.9.0.0 introduced Authentication and
>Encryption (beta again) between clients & brokers, but this is available
>only newer Consumer API's and not in the older Low-level/High-level API's.
>
>From briefly studying the implementation of
>DirectKafkaInputDStream/KafkaRDD and new Consumer API, my thinking is that
>it is possible to support the exact current implementation you have using
>the new API's.
>One area that isnt so straightforward was the ctor of KafkaRDD fixes
>the offsetRange (I did read about the deterministic feature you were after)
>and i couldnt find a direct method in the new Consumer API to get the
>current 'latest' offset - however one can do a consumer.seekToEnd() and
>then call a consumer.position().
>Of course one other benefit is that the new Consumer API's abstracts
>away having to deal with finding the leader for a partition, so can get rid
>of that code
>
>Would be great to get your thoughts.
>
>thanks in advance
>Mario
>
>
>
>