Hi All,

I think this can be added under java --> io --> aws-cloud-platform with
more io connectors can be added into it eg. S3 also.

Regards,
Tarush

On Mon, Jun 12, 2017 at 4:03 AM, Madhusudan Borkar <mbor...@etouch.net>
wrote:

> Yes, I believe so. Thanks for the Jira.
>
> Madhu Borkar
>
> On Sat, Jun 10, 2017 at 10:36 PM, Jean-Baptiste Onofré <j...@nanthrax.net>
> wrote:
>
> > Hi,
> >
> > I created a Jira to add custom splitting to JdbcIO (but it's not so
> > trivial depending of the backends.
> >
> > Regarding your proposal it sounds interesting, but do you think we will
> > have really "parallel" read of the split ? I think splitting makes sense
> if
> > we can do parallel read: if we split to read on an unique backend, it
> > doesn't bring lot of improvement.
> >
> > Regards
> > JB
> >
> >
> > On 06/10/2017 09:28 PM, Madhusudan Borkar wrote:
> >
> >> Hi,
> >> We are proposing to develop connector for AWS Aurora. Aurora being
> cluster
> >> for relational database (MySQL) has no Java api for reading/writing
> other
> >> than jdbc client. Although there is a JdbcIO available, it looks like it
> >> doesn't work in parallel. The proposal is to provide split functionality
> >> and then use transform to parallelize the operation. As mentioned above,
> >> this is typical sql based database and not comparable with likes of
> Hive.
> >> Hive implementation is based on abstraction over Hdfs file system of
> >> Hadoop, which provides splits. Here none of these are applicable.
> >> During implementation of Hive connector there was lot of discussion as
> how
> >> to implement connector while strictly following Beam design principal
> >> using
> >> Bounded source. I am not sure how Aurora connector will fit into these
> >> design principals.
> >> Here is our proposal.
> >> 1. Split functionality: If the table contains 'x' rows, it will be split
> >> into 'n' bundles in the split method. This would be done like follows :
> >> noOfSplits = 'x' * size of a single row / bundleSize hint from runner.
> >> 2. Then each of these 'pseudo' splits would be read in parallel
> >> 3. Each of these reads will use db connection from connection pool.
> >> This will provide better bench marking. Please, let know your views.
> >>
> >> Thanks
> >> Madhu Borkar
> >>
> >>
> > --
> > Jean-Baptiste Onofré
> > jbono...@apache.org
> > http://blog.nanthrax.net
> > Talend - http://www.talend.com
> >
>

Reply via email to