Thanks for your interest John, it would be a really nice contribution to add SQS support.
Some context on the kinesis stuff: The reason why kinesis is still in a separate module is more related to a licensing problem. Kinesis uses some native libraries that are published under a not 100% apache compatible license and we are not allowed to shade and republish them but it seems there is a workaround now, for more details see https://issues.apache.org/jira/browse/BEAM-3549 In any case if to use SQS you only need the Apache licensed aws-sdk deps it is ok (and a good idea) if you put it in the amazon-web-services module. The kinesis connector is way more complex for multiple reasons, first, the raw version of the amazon client libraries is not so ‘friendly’ and the guys who created KinesisIO had to do some workarounds to provide accurate checkpointing/watermarks. So since SQS is a way simpler system you should probably be ok basing it in simpler sources like AMQP or JMS. If you feel like to, please create the JIRA and don’t hesitate to ask questions if you find issues or if you need some review. On Thu, Jul 19, 2018 at 12:55 AM Lukasz Cwik <[email protected]> wrote: > > > > On Wed, Jul 18, 2018 at 3:30 PM John Rudolf Lewis <[email protected]> > wrote: >> >> I need an SQS source for my project that is using beam. A brief search did >> not turn up any in-progress work in this area. Please point me to the right >> repo if I missed it. > > > To my knowledge there is none and nobody has marked it in progress on > https://beam.apache.org/documentation/io/built-in/. It would be good to > create a JIRA issue on https://issues.apache.org/ and send a PR to add SQS to > the inprogress list referencing your JIRA. I added you as a contributor in > JIRA so you should be able to assign yourself to any issues that you create. > >> >> Assuming there is no in-progress effort, I would like to contribute an >> Amazon SQS source. I have a few questions before I begin. > > > Great, note that this is a good starting point for authoring an IO transform: > https://beam.apache.org/documentation/io/authoring-overview/ > >> >> >> It seems that the current AWS code is split into two different modules: >> sdk/java/io/amazon-web-services which contains the S3FileSystem, AwsOptions, >> etc, and sdk/java/io/kinesis which contains an unbounded source based on a >> kinesis topic. I'd like to add this source to the amazon-web-services module >> since I'd like to depend on AwsOptions. Does adding this source to the >> amazon-web-services module make sense? > > > Putting it inside of amazon-web-services makes a lot of sense. The Google > connectors all live within the one package and there has been discussion to > consolidate all the AWS stuff under amazon-web-services. > >> >> Also, the kinesis source looks a touch more complex than other sources. Both >> the JMS and AMQP sources look like better examples to follow. Which existing >> source would be the best to model this contribution after? > > > Some of it has to do with how many ways a source can be read and how > complicated the watermark tracking but it would be best if the IO authors > comment on implementation details. > >> >> If anyone has put some thoughts into this, or better yet some code, I'd >> appreciate hearing from you. >> >> Thanks! >>
