Looking at Tim's PR. On Tue, Jul 31, 2018 at 10:53 AM Ismaël Mejía <[email protected]> wrote:
> Reuven. I already started review and hope to finish later on today or > tomorrow at latest. If you can, it would be good to take a look at Tim's PR > that has been opened for longer time. > > On Tue, Jul 31, 2018, 6:36 PM Tim Robertson <[email protected]> > wrote: > >> I took a pass at reviewing (non committer). I haven't worked on unbounded >> IO so wasn't familiar enough with the timestamp and checkpointing but >> otherwise it LGTM in general - thanks John and for applying the minor >> suggestions. >> >> OT: Reuven, if you have time on your hands there is also the KuduIO >> awaiting review (https://github.com/apache/beam/pull/6021) >> >> >> >> >> >> On Tue, Jul 31, 2018 at 5:07 PM, Reuven Lax <[email protected]> wrote: >> >>> Ismael, do you have time for this review? If you're too busy, I can try >>> to help review it. >>> >>> John, unfortunately, as Ismael said, even if we speed up the review the >>> 2.6.0 branch has already been cut, and we try and only cherry pick >>> important bugfixes. Hopefully the next release will be soon, and it's also >>> possible to use the nightly Beam releases in the interim. >>> >>> Reuven >>> >>> On Tue, Jul 31, 2018 at 5:14 AM Ismaël Mejía <[email protected]> wrote: >>> >>>> Hi, we can try to speed up the review, but the 2.6.0 branch was >>>> already cut and was stabilizing for the last two weeks, so I am not >>>> sure it will make it. Next release should be cut shortly hopefully in >>>> 3-4 weeks to follow the 6 week release plan. Hope this can work for >>>> you. >>>> >>>> On Tue, Jul 31, 2018 at 2:13 AM John Rudolf Lewis <[email protected]> >>>> wrote: >>>> > >>>> > I created a pr for my SqsIO contribution. I look forward to your >>>> comments. >>>> > >>>> > https://github.com/apache/beam/pull/6101 >>>> > >>>> > Any chance this could be a part of the 2.6.0 release? >>>> > >>>> > On Thu, Jul 19, 2018 at 7:39 AM, John Rudolf Lewis < >>>> [email protected]> wrote: >>>> >> >>>> >> Thank you. >>>> >> >>>> >> I've created a jira ticket to add SQS and have assigned it to >>>> myself: https://issues.apache.org/jira/browse/BEAM-4828 >>>> >> >>>> >> Modified the documentation to show it as in-progress: >>>> https://github.com/apache/beam/pull/5995 >>>> >> >>>> >> And will be starting my work here: >>>> https://github.com/JohnRudolfLewis/beam/tree/Add-SqsIO >>>> >> >>>> >> >>>> >> On Thu, Jul 19, 2018 at 1:43 AM, Jean-Baptiste Onofré < >>>> [email protected]> wrote: >>>> >>> >>>> >>> Agree with Ismaël. >>>> >>> >>>> >>> I would be more than happy to help on this one (as I contributed on >>>> AMQP >>>> >>> and JMS IOs ;)). >>>> >>> >>>> >>> Regards >>>> >>> JB >>>> >>> >>>> >>> On 19/07/2018 10:39, Ismaël Mejía wrote: >>>> >>> > Thanks for your interest John, it would be a really nice >>>> contribution >>>> >>> > to add SQS support. >>>> >>> > >>>> >>> > Some context on the kinesis stuff: >>>> >>> > >>>> >>> > The reason why kinesis is still in a separate module is more >>>> related >>>> >>> > to a licensing problem. Kinesis uses some native libraries that >>>> are >>>> >>> > published under a not 100% apache compatible license and we are >>>> not >>>> >>> > allowed to shade and republish them but it seems there is a >>>> workaround >>>> >>> > now, for more details see >>>> >>> > https://issues.apache.org/jira/browse/BEAM-3549 >>>> >>> > In any case if to use SQS you only need the Apache licensed >>>> aws-sdk >>>> >>> > deps it is ok (and a good idea) if you put it in the >>>> >>> > amazon-web-services module. >>>> >>> > >>>> >>> > The kinesis connector is way more complex for multiple reasons, >>>> first, >>>> >>> > the raw version of the amazon client libraries is not so >>>> ‘friendly’ >>>> >>> > and the guys who created KinesisIO had to do some workarounds to >>>> >>> > provide accurate checkpointing/watermarks. So since SQS is a way >>>> >>> > simpler system you should probably be ok basing it in simpler >>>> sources >>>> >>> > like AMQP or JMS. >>>> >>> > >>>> >>> > If you feel like to, please create the JIRA and don’t hesitate to >>>> ask >>>> >>> > questions if you find issues or if you need some review. >>>> >>> > >>>> >>> > On Thu, Jul 19, 2018 at 12:55 AM Lukasz Cwik <[email protected]> >>>> wrote: >>>> >>> >> >>>> >>> >> >>>> >>> >> >>>> >>> >> On Wed, Jul 18, 2018 at 3:30 PM John Rudolf Lewis < >>>> [email protected]> wrote: >>>> >>> >>> >>>> >>> >>> I need an SQS source for my project that is using beam. A brief >>>> search did not turn up any in-progress work in this area. Please point me >>>> to the right repo if I missed it. >>>> >>> >> >>>> >>> >> >>>> >>> >> To my knowledge there is none and nobody has marked it in >>>> progress on https://beam.apache.org/documentation/io/built-in/. It >>>> would be good to create a JIRA issue on https://issues.apache.org/ and >>>> send a PR to add SQS to the inprogress list referencing your JIRA. I added >>>> you as a contributor in JIRA so you should be able to assign yourself to >>>> any issues that you create. >>>> >>> >> >>>> >>> >>> >>>> >>> >>> Assuming there is no in-progress effort, I would like to >>>> contribute an Amazon SQS source. I have a few questions before I begin. >>>> >>> >> >>>> >>> >> >>>> >>> >> Great, note that this is a good starting point for authoring an >>>> IO transform: >>>> https://beam.apache.org/documentation/io/authoring-overview/ >>>> >>> >> >>>> >>> >>> >>>> >>> >>> >>>> >>> >>> It seems that the current AWS code is split into two different >>>> modules: sdk/java/io/amazon-web-services which contains the S3FileSystem, >>>> AwsOptions, etc, and sdk/java/io/kinesis which contains an unbounded source >>>> based on a kinesis topic. I'd like to add this source to the >>>> amazon-web-services module since I'd like to depend on AwsOptions. Does >>>> adding this source to the amazon-web-services module make sense? >>>> >>> >> >>>> >>> >> >>>> >>> >> Putting it inside of amazon-web-services makes a lot of sense. >>>> The Google connectors all live within the one package and there has been >>>> discussion to consolidate all the AWS stuff under amazon-web-services. >>>> >>> >> >>>> >>> >>> >>>> >>> >>> Also, the kinesis source looks a touch more complex than other >>>> sources. Both the JMS and AMQP sources look like better examples to follow. >>>> Which existing source would be the best to model this contribution after? >>>> >>> >> >>>> >>> >> >>>> >>> >> Some of it has to do with how many ways a source can be read and >>>> how complicated the watermark tracking but it would be best if the IO >>>> authors comment on implementation details. >>>> >>> >> >>>> >>> >>> >>>> >>> >>> If anyone has put some thoughts into this, or better yet some >>>> code, I'd appreciate hearing from you. >>>> >>> >>> >>>> >>> >>> Thanks! >>>> >>> >>> >>>> >>> >>>> >>> -- >>>> >>> Jean-Baptiste Onofré >>>> >>> [email protected] >>>> >>> http://blog.nanthrax.net >>>> >>> Talend - http://www.talend.com >>>> >> >>>> >> >>>> > >>>> >>> >>
