Thanks Peter! Please also make sure to use SourceTestUtils to verify that
your FileBasedSource is well-behaved w.r.t. dynamic work rebalancing
(especially the various assertSplitAtFraction methods). For examples, see
XmlSourceTest
<https://github.com/GoogleCloudPlatform/DataflowJavaSDK/blob/master/sdk/src/test/java/com/google/cloud/dataflow/sdk/io/XmlSourceTest.java>
.

On Mon, Mar 14, 2016 at 12:10 PM Giesin, Peter <[email protected]>
wrote:

> The MultiLineIO is a BoundedSource and an extension of FileBasedSource.
> Where the FileBasedSource reads a single line at a time the MultiLineIO
> allows the user to define an arbitrary “message” delimiter. It then reads
> through the file, removing newlines, until the separator is read, finally
> returning the character sequence that is built.
>
>
>
> I believe it is already built using the new style but I will compare it to
> the BigTableIO to confirm that.
>
> Peter
>
> On 3/14/16, 1:50 PM, "Jean-Baptiste Onofré" <[email protected]> wrote:
>
> >I second Eugene here.
> >
> >In the past, I developed some IOs using the "old style" (as did in the
> >PubSubIO). I'm now refactoring it to use the "new style".
> >
> >Regards
> >JB
> >
> >On 03/14/2016 06:47 PM, Eugene Kirpichov wrote:
> >> Hi Peter,
> >> Looking forward to your PR. Please note that source classes are
> relatively
> >> tricky to develop, so would you mind briefly explaining what your source
> >> will do here over email, so that we hash out some possible issues early
> >> rather than in PR comments?
> >> Also note that now recommend to package IO connectors as PTransforms,
> >> making the PTransform class itself be a builder - while the Source/Sink
> >> classes should be kept package-private (rather than exposed to the
> user).
> >> For an example of a connector packaged in this style, see BigtableIO (
> >>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_GoogleCloudPlatform_DataflowJavaSDK_blob_master_sdk_src_main_java_com_google_cloud_dataflow_sdk_io_bigtable_BigtableIO.java&d=BQIDaQ&c=3BfiSO86x5iKjpl2b39jud9R1NrKYqPq2js90dwBswk&r=Qm-l_hW9ETnsf6X4GnnKezFfnAEwc328ni8ljHdGYjo&m=spZLCFrFYTtUSPsGFMTVvmXPyfW-dr7Uouq-4BtWaPQ&s=qJJMaoRlOHxy1MRcAwa7aIJxwGYJyUKL93FdO4jZr1I&e=
> >> ).
> >> The advantage is that this style allows you to restructure the
> connector or
> >> add additional transforms into its implementation if necessary, without
> >> changing the call sites. It might seem less important in case of a
> simple
> >> connector like reading lines from file, but it will become much more
> >> important with things like SplittableDoFn
> >> <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_BEAM-2D65&d=BQIDaQ&c=3BfiSO86x5iKjpl2b39jud9R1NrKYqPq2js90dwBswk&r=Qm-l_hW9ETnsf6X4GnnKezFfnAEwc328ni8ljHdGYjo&m=spZLCFrFYTtUSPsGFMTVvmXPyfW-dr7Uouq-4BtWaPQ&s=POJMhWDTbkUnHHLnKcH9FtzeP-lrZkuGZG3YPNNhXSU&e=
> >.
> >>
> >> On Mon, Mar 14, 2016 at 10:29 AM Jean-Baptiste Onofré <[email protected]>
> >> wrote:
> >>
> >>> Hi Peter,
> >>>
> >>> awesome !
> >>>
> >>> Yes, you can create the PR using the github mirror.
> >>>
> >>> Does your MultiLineIO use Bounded/Unbounded "new" classes ?
> >>>
> >>> Regards
> >>> JB
> >>>
> >>> On 03/14/2016 06:23 PM, Giesin, Peter wrote:
> >>>> Hi all!
> >>>>
> >>>> I am looking to get involved in the project. I have a MultiLineIO
> >>> file-based source that I think would be useful. I know the project is
> just
> >>> spinning up but can I simply clone the repo and create a PR for the
> new IO?
> >>> Also looked over JIRA and there are some tickets I can help out with.
> >>>>
> >>>> Best regards,
> >>>> Peter Giesin
> >>>> [email protected]
> >>>>
> >>>>
> >>>> _____________
> >>>> The information contained in this message is proprietary and/or
> >>> confidential. If you are not the intended recipient, please: (i)
> delete the
> >>> message and all copies; (ii) do not disclose, distribute or use the
> message
> >>> in any manner; and (iii) notify the sender immediately. In addition,
> please
> >>> be aware that any message addressed to our domain is subject to
> archiving
> >>> and review by persons other than the intended recipient. Thank you.
> >>>>
> >>>
> >>> --
> >>> Jean-Baptiste Onofré
> >>> [email protected]
> >>>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__blog.nanthrax.net&d=BQIDaQ&c=3BfiSO86x5iKjpl2b39jud9R1NrKYqPq2js90dwBswk&r=Qm-l_hW9ETnsf6X4GnnKezFfnAEwc328ni8ljHdGYjo&m=spZLCFrFYTtUSPsGFMTVvmXPyfW-dr7Uouq-4BtWaPQ&s=f6FNnwvFjzBZnAIvDfndYuU_lAso931YU4yr4oSnypE&e=
> >>> Talend -
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.talend.com&d=BQIDaQ&c=3BfiSO86x5iKjpl2b39jud9R1NrKYqPq2js90dwBswk&r=Qm-l_hW9ETnsf6X4GnnKezFfnAEwc328ni8ljHdGYjo&m=spZLCFrFYTtUSPsGFMTVvmXPyfW-dr7Uouq-4BtWaPQ&s=LtKQ-yfpvERysYJvdj3EP_VPA47BuNVkJ6hqfIW1RQM&e=
> >>>
> >>
> >
> >--
> >Jean-Baptiste Onofré
> >[email protected]
> >
> https://urldefense.proofpoint.com/v2/url?u=http-3A__blog.nanthrax.net&d=BQIDaQ&c=3BfiSO86x5iKjpl2b39jud9R1NrKYqPq2js90dwBswk&r=Qm-l_hW9ETnsf6X4GnnKezFfnAEwc328ni8ljHdGYjo&m=spZLCFrFYTtUSPsGFMTVvmXPyfW-dr7Uouq-4BtWaPQ&s=f6FNnwvFjzBZnAIvDfndYuU_lAso931YU4yr4oSnypE&e=
> >Talend -
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.talend.com&d=BQIDaQ&c=3BfiSO86x5iKjpl2b39jud9R1NrKYqPq2js90dwBswk&r=Qm-l_hW9ETnsf6X4GnnKezFfnAEwc328ni8ljHdGYjo&m=spZLCFrFYTtUSPsGFMTVvmXPyfW-dr7Uouq-4BtWaPQ&s=LtKQ-yfpvERysYJvdj3EP_VPA47BuNVkJ6hqfIW1RQM&e=
> >
> >_____________
> >The information contained in this message is proprietary and/or
> confidential. If you are not the intended recipient, please: (i) delete the
> message and all copies; (ii) do not disclose, distribute or use the message
> in any manner; and (iii) notify the sender immediately. In addition, please
> be aware that any message addressed to our domain is subject to archiving
> and review by persons other than the intended recipient. Thank you.
>
> _____________
> The information contained in this message is proprietary and/or
> confidential. If you are not the intended recipient, please: (i) delete the
> message and all copies; (ii) do not disclose, distribute or use the message
> in any manner; and (iii) notify the sender immediately. In addition, please
> be aware that any message addressed to our domain is subject to archiving
> and review by persons other than the intended recipient. Thank you.
>

Reply via email to