MongoDBIO is based on BoundedSource framework so there's no easy way to
introduce custom code (a ParDo) that precede it in a pipeline. A ReadAll
transform (as JB mentioned) will be ParDo based and you will be able to
have a preceding custom ParDo that runs the initialization and feeds data
into the source. So I agree that this will be the proper solution.

Downside is that some advanced features (for example, dynamic work
rebalancing) will not be supported till Splittable DoFn is fully fleshed
out. But looks like MongoDB currently does not support this feature anyways
so it should be OK.

Thanks,
Cham

On Mon, Oct 15, 2018 at 7:08 AM Jean-Baptiste Onofré <[email protected]>
wrote:

> JdbcIO uses the following:
>
>       return input
>           .apply(Create.of((Void) null))
>           .apply(
>               JdbcIO.<Void, T>readAll()
>
> .withDataSourceConfiguration(getDataSourceConfiguration())
>                   .withQuery(getQuery())
>                   .withCoder(getCoder())
>                   .withRowMapper(getRowMapper())
>                   .withFetchSize(getFetchSize())
>                   .withParameterSetter(
>                       (element, preparedStatement) -> {
>                         if (getStatementPreparator() != null) {
>
> getStatementPreparator().setParameters(preparedStatement);
>                         }
>                       }));
>
> You can see that PBegin triggers readAll() that actually fires the
> configuration and fetching.
>
> I think we can do the same in MongoDbIO.
>
> Regards
> JB
>
> On 15/10/2018 16:00, Chaim Turkel wrote:
> > what would be the implementation for the JdbcIO?
> > On Mon, Oct 15, 2018 at 2:47 PM Jean-Baptiste Onofré <[email protected]>
> wrote:
> >>
> >> If you want to reuse MongoDbIO, there's no easy way.
> >>
> >> However, I can introduce the same change we did in Jdbc or Elasticsearch
> >> IOs.
> >>
> >> Agree ?
> >>
> >> Regards
> >> JB
> >>
> >> On 15/10/2018 13:46, Chaim Turkel wrote:
> >>> Thanks,
> >>>   I need to wrap MongoDbIO.read, and don't see an easy way to do it
> >>> chaim
> >>> On Mon, Oct 15, 2018 at 2:30 PM Jean-Baptiste Onofré <[email protected]>
> wrote:
> >>>>
> >>>> Hi Chaim,
> >>>>
> >>>> you can take a look on JdbcIO.
> >>>>
> >>>> You can create any "startup" PCollection on the PBegin, and then you
> can
> >>>> can the DoFn based on that.
> >>>>
> >>>> Regards
> >>>> JB
> >>>>
> >>>> On 15/10/2018 13:00, Chaim Turkel wrote:
> >>>>> Hi,
> >>>>>   I there a way to write code before the PBegin.
> >>>>> I am writeing a pipeline to connect to mongo with self signed ssl. I
> >>>>> need to init the ssl connection of the java before the mongo code. So
> >>>>> i need to write code before the PBegin but for it to run on each
> node?
> >>>>>
> >>>>>
> >>>>> Chaim
> >>>>>
> >>>>
> >>>> --
> >>>> Jean-Baptiste Onofré
> >>>> [email protected]
> >>>> http://blog.nanthrax.net
> >>>> Talend - http://www.talend.com
> >>>
> >>
> >> --
> >> Jean-Baptiste Onofré
> >> [email protected]
> >> http://blog.nanthrax.net
> >> Talend - http://www.talend.com
> >
>
> --
> Jean-Baptiste Onofré
> [email protected]
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>

Reply via email to