I took a pass at reviewing (non committer). I haven't worked on unbounded IO so wasn't familiar enough with the timestamp and checkpointing but otherwise it LGTM in general - thanks John and for applying the minor suggestions.
OT: Reuven, if you have time on your hands there is also the KuduIO awaiting review (https://github.com/apache/beam/pull/6021) On Tue, Jul 31, 2018 at 5:07 PM, Reuven Lax <[email protected]> wrote: > Ismael, do you have time for this review? If you're too busy, I can try to > help review it. > > John, unfortunately, as Ismael said, even if we speed up the review the > 2.6.0 branch has already been cut, and we try and only cherry pick > important bugfixes. Hopefully the next release will be soon, and it's also > possible to use the nightly Beam releases in the interim. > > Reuven > > On Tue, Jul 31, 2018 at 5:14 AM Ismaël Mejía <[email protected]> wrote: > >> Hi, we can try to speed up the review, but the 2.6.0 branch was >> already cut and was stabilizing for the last two weeks, so I am not >> sure it will make it. Next release should be cut shortly hopefully in >> 3-4 weeks to follow the 6 week release plan. Hope this can work for >> you. >> >> On Tue, Jul 31, 2018 at 2:13 AM John Rudolf Lewis <[email protected]> >> wrote: >> > >> > I created a pr for my SqsIO contribution. I look forward to your >> comments. >> > >> > https://github.com/apache/beam/pull/6101 >> > >> > Any chance this could be a part of the 2.6.0 release? >> > >> > On Thu, Jul 19, 2018 at 7:39 AM, John Rudolf Lewis < >> [email protected]> wrote: >> >> >> >> Thank you. >> >> >> >> I've created a jira ticket to add SQS and have assigned it to myself: >> https://issues.apache.org/jira/browse/BEAM-4828 >> >> >> >> Modified the documentation to show it as in-progress: >> https://github.com/apache/beam/pull/5995 >> >> >> >> And will be starting my work here: https://github.com/ >> JohnRudolfLewis/beam/tree/Add-SqsIO >> >> >> >> >> >> On Thu, Jul 19, 2018 at 1:43 AM, Jean-Baptiste Onofré <[email protected]> >> wrote: >> >>> >> >>> Agree with Ismaël. >> >>> >> >>> I would be more than happy to help on this one (as I contributed on >> AMQP >> >>> and JMS IOs ;)). >> >>> >> >>> Regards >> >>> JB >> >>> >> >>> On 19/07/2018 10:39, Ismaël Mejía wrote: >> >>> > Thanks for your interest John, it would be a really nice >> contribution >> >>> > to add SQS support. >> >>> > >> >>> > Some context on the kinesis stuff: >> >>> > >> >>> > The reason why kinesis is still in a separate module is more related >> >>> > to a licensing problem. Kinesis uses some native libraries that are >> >>> > published under a not 100% apache compatible license and we are not >> >>> > allowed to shade and republish them but it seems there is a >> workaround >> >>> > now, for more details see >> >>> > https://issues.apache.org/jira/browse/BEAM-3549 >> >>> > In any case if to use SQS you only need the Apache licensed aws-sdk >> >>> > deps it is ok (and a good idea) if you put it in the >> >>> > amazon-web-services module. >> >>> > >> >>> > The kinesis connector is way more complex for multiple reasons, >> first, >> >>> > the raw version of the amazon client libraries is not so ‘friendly’ >> >>> > and the guys who created KinesisIO had to do some workarounds to >> >>> > provide accurate checkpointing/watermarks. So since SQS is a way >> >>> > simpler system you should probably be ok basing it in simpler >> sources >> >>> > like AMQP or JMS. >> >>> > >> >>> > If you feel like to, please create the JIRA and don’t hesitate to >> ask >> >>> > questions if you find issues or if you need some review. >> >>> > >> >>> > On Thu, Jul 19, 2018 at 12:55 AM Lukasz Cwik <[email protected]> >> wrote: >> >>> >> >> >>> >> >> >>> >> >> >>> >> On Wed, Jul 18, 2018 at 3:30 PM John Rudolf Lewis < >> [email protected]> wrote: >> >>> >>> >> >>> >>> I need an SQS source for my project that is using beam. A brief >> search did not turn up any in-progress work in this area. Please point me >> to the right repo if I missed it. >> >>> >> >> >>> >> >> >>> >> To my knowledge there is none and nobody has marked it in progress >> on https://beam.apache.org/documentation/io/built-in/. It would be good >> to create a JIRA issue on https://issues.apache.org/ and send a PR to >> add SQS to the inprogress list referencing your JIRA. I added you as a >> contributor in JIRA so you should be able to assign yourself to any issues >> that you create. >> >>> >> >> >>> >>> >> >>> >>> Assuming there is no in-progress effort, I would like to >> contribute an Amazon SQS source. I have a few questions before I begin. >> >>> >> >> >>> >> >> >>> >> Great, note that this is a good starting point for authoring an IO >> transform: https://beam.apache.org/documentation/io/authoring-overview/ >> >>> >> >> >>> >>> >> >>> >>> >> >>> >>> It seems that the current AWS code is split into two different >> modules: sdk/java/io/amazon-web-services which contains the >> S3FileSystem, AwsOptions, etc, and sdk/java/io/kinesis which contains an >> unbounded source based on a kinesis topic. I'd like to add this source to >> the amazon-web-services module since I'd like to depend on AwsOptions. Does >> adding this source to the amazon-web-services module make sense? >> >>> >> >> >>> >> >> >>> >> Putting it inside of amazon-web-services makes a lot of sense. The >> Google connectors all live within the one package and there has been >> discussion to consolidate all the AWS stuff under amazon-web-services. >> >>> >> >> >>> >>> >> >>> >>> Also, the kinesis source looks a touch more complex than other >> sources. Both the JMS and AMQP sources look like better examples to follow. >> Which existing source would be the best to model this contribution after? >> >>> >> >> >>> >> >> >>> >> Some of it has to do with how many ways a source can be read and >> how complicated the watermark tracking but it would be best if the IO >> authors comment on implementation details. >> >>> >> >> >>> >>> >> >>> >>> If anyone has put some thoughts into this, or better yet some >> code, I'd appreciate hearing from you. >> >>> >>> >> >>> >>> Thanks! >> >>> >>> >> >>> >> >>> -- >> >>> Jean-Baptiste Onofré >> >>> [email protected] >> >>> http://blog.nanthrax.net >> >>> Talend - http://www.talend.com >> >> >> >> >> > >> >
