Hi All, I think this can be added under java --> io --> aws-cloud-platform with more io connectors can be added into it eg. S3 also.
Regards, Tarush On Mon, Jun 12, 2017 at 4:03 AM, Madhusudan Borkar <mbor...@etouch.net> wrote: > Yes, I believe so. Thanks for the Jira. > > Madhu Borkar > > On Sat, Jun 10, 2017 at 10:36 PM, Jean-Baptiste Onofré <j...@nanthrax.net> > wrote: > > > Hi, > > > > I created a Jira to add custom splitting to JdbcIO (but it's not so > > trivial depending of the backends. > > > > Regarding your proposal it sounds interesting, but do you think we will > > have really "parallel" read of the split ? I think splitting makes sense > if > > we can do parallel read: if we split to read on an unique backend, it > > doesn't bring lot of improvement. > > > > Regards > > JB > > > > > > On 06/10/2017 09:28 PM, Madhusudan Borkar wrote: > > > >> Hi, > >> We are proposing to develop connector for AWS Aurora. Aurora being > cluster > >> for relational database (MySQL) has no Java api for reading/writing > other > >> than jdbc client. Although there is a JdbcIO available, it looks like it > >> doesn't work in parallel. The proposal is to provide split functionality > >> and then use transform to parallelize the operation. As mentioned above, > >> this is typical sql based database and not comparable with likes of > Hive. > >> Hive implementation is based on abstraction over Hdfs file system of > >> Hadoop, which provides splits. Here none of these are applicable. > >> During implementation of Hive connector there was lot of discussion as > how > >> to implement connector while strictly following Beam design principal > >> using > >> Bounded source. I am not sure how Aurora connector will fit into these > >> design principals. > >> Here is our proposal. > >> 1. Split functionality: If the table contains 'x' rows, it will be split > >> into 'n' bundles in the split method. This would be done like follows : > >> noOfSplits = 'x' * size of a single row / bundleSize hint from runner. > >> 2. Then each of these 'pseudo' splits would be read in parallel > >> 3. Each of these reads will use db connection from connection pool. > >> This will provide better bench marking. Please, let know your views. > >> > >> Thanks > >> Madhu Borkar > >> > >> > > -- > > Jean-Baptiste Onofré > > jbono...@apache.org > > http://blog.nanthrax.net > > Talend - http://www.talend.com > > >