Thanks Peter! Please also make sure to use SourceTestUtils to verify that your FileBasedSource is well-behaved w.r.t. dynamic work rebalancing (especially the various assertSplitAtFraction methods). For examples, see XmlSourceTest <https://github.com/GoogleCloudPlatform/DataflowJavaSDK/blob/master/sdk/src/test/java/com/google/cloud/dataflow/sdk/io/XmlSourceTest.java> .
On Mon, Mar 14, 2016 at 12:10 PM Giesin, Peter <[email protected]> wrote: > The MultiLineIO is a BoundedSource and an extension of FileBasedSource. > Where the FileBasedSource reads a single line at a time the MultiLineIO > allows the user to define an arbitrary “message” delimiter. It then reads > through the file, removing newlines, until the separator is read, finally > returning the character sequence that is built. > > > > I believe it is already built using the new style but I will compare it to > the BigTableIO to confirm that. > > Peter > > On 3/14/16, 1:50 PM, "Jean-Baptiste Onofré" <[email protected]> wrote: > > >I second Eugene here. > > > >In the past, I developed some IOs using the "old style" (as did in the > >PubSubIO). I'm now refactoring it to use the "new style". > > > >Regards > >JB > > > >On 03/14/2016 06:47 PM, Eugene Kirpichov wrote: > >> Hi Peter, > >> Looking forward to your PR. Please note that source classes are > relatively > >> tricky to develop, so would you mind briefly explaining what your source > >> will do here over email, so that we hash out some possible issues early > >> rather than in PR comments? > >> Also note that now recommend to package IO connectors as PTransforms, > >> making the PTransform class itself be a builder - while the Source/Sink > >> classes should be kept package-private (rather than exposed to the > user). > >> For an example of a connector packaged in this style, see BigtableIO ( > >> > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_GoogleCloudPlatform_DataflowJavaSDK_blob_master_sdk_src_main_java_com_google_cloud_dataflow_sdk_io_bigtable_BigtableIO.java&d=BQIDaQ&c=3BfiSO86x5iKjpl2b39jud9R1NrKYqPq2js90dwBswk&r=Qm-l_hW9ETnsf6X4GnnKezFfnAEwc328ni8ljHdGYjo&m=spZLCFrFYTtUSPsGFMTVvmXPyfW-dr7Uouq-4BtWaPQ&s=qJJMaoRlOHxy1MRcAwa7aIJxwGYJyUKL93FdO4jZr1I&e= > >> ). > >> The advantage is that this style allows you to restructure the > connector or > >> add additional transforms into its implementation if necessary, without > >> changing the call sites. It might seem less important in case of a > simple > >> connector like reading lines from file, but it will become much more > >> important with things like SplittableDoFn > >> < > https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_BEAM-2D65&d=BQIDaQ&c=3BfiSO86x5iKjpl2b39jud9R1NrKYqPq2js90dwBswk&r=Qm-l_hW9ETnsf6X4GnnKezFfnAEwc328ni8ljHdGYjo&m=spZLCFrFYTtUSPsGFMTVvmXPyfW-dr7Uouq-4BtWaPQ&s=POJMhWDTbkUnHHLnKcH9FtzeP-lrZkuGZG3YPNNhXSU&e= > >. > >> > >> On Mon, Mar 14, 2016 at 10:29 AM Jean-Baptiste Onofré <[email protected]> > >> wrote: > >> > >>> Hi Peter, > >>> > >>> awesome ! > >>> > >>> Yes, you can create the PR using the github mirror. > >>> > >>> Does your MultiLineIO use Bounded/Unbounded "new" classes ? > >>> > >>> Regards > >>> JB > >>> > >>> On 03/14/2016 06:23 PM, Giesin, Peter wrote: > >>>> Hi all! > >>>> > >>>> I am looking to get involved in the project. I have a MultiLineIO > >>> file-based source that I think would be useful. I know the project is > just > >>> spinning up but can I simply clone the repo and create a PR for the > new IO? > >>> Also looked over JIRA and there are some tickets I can help out with. > >>>> > >>>> Best regards, > >>>> Peter Giesin > >>>> [email protected] > >>>> > >>>> > >>>> _____________ > >>>> The information contained in this message is proprietary and/or > >>> confidential. If you are not the intended recipient, please: (i) > delete the > >>> message and all copies; (ii) do not disclose, distribute or use the > message > >>> in any manner; and (iii) notify the sender immediately. In addition, > please > >>> be aware that any message addressed to our domain is subject to > archiving > >>> and review by persons other than the intended recipient. Thank you. > >>>> > >>> > >>> -- > >>> Jean-Baptiste Onofré > >>> [email protected] > >>> > https://urldefense.proofpoint.com/v2/url?u=http-3A__blog.nanthrax.net&d=BQIDaQ&c=3BfiSO86x5iKjpl2b39jud9R1NrKYqPq2js90dwBswk&r=Qm-l_hW9ETnsf6X4GnnKezFfnAEwc328ni8ljHdGYjo&m=spZLCFrFYTtUSPsGFMTVvmXPyfW-dr7Uouq-4BtWaPQ&s=f6FNnwvFjzBZnAIvDfndYuU_lAso931YU4yr4oSnypE&e= > >>> Talend - > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.talend.com&d=BQIDaQ&c=3BfiSO86x5iKjpl2b39jud9R1NrKYqPq2js90dwBswk&r=Qm-l_hW9ETnsf6X4GnnKezFfnAEwc328ni8ljHdGYjo&m=spZLCFrFYTtUSPsGFMTVvmXPyfW-dr7Uouq-4BtWaPQ&s=LtKQ-yfpvERysYJvdj3EP_VPA47BuNVkJ6hqfIW1RQM&e= > >>> > >> > > > >-- > >Jean-Baptiste Onofré > >[email protected] > > > https://urldefense.proofpoint.com/v2/url?u=http-3A__blog.nanthrax.net&d=BQIDaQ&c=3BfiSO86x5iKjpl2b39jud9R1NrKYqPq2js90dwBswk&r=Qm-l_hW9ETnsf6X4GnnKezFfnAEwc328ni8ljHdGYjo&m=spZLCFrFYTtUSPsGFMTVvmXPyfW-dr7Uouq-4BtWaPQ&s=f6FNnwvFjzBZnAIvDfndYuU_lAso931YU4yr4oSnypE&e= > >Talend - > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.talend.com&d=BQIDaQ&c=3BfiSO86x5iKjpl2b39jud9R1NrKYqPq2js90dwBswk&r=Qm-l_hW9ETnsf6X4GnnKezFfnAEwc328ni8ljHdGYjo&m=spZLCFrFYTtUSPsGFMTVvmXPyfW-dr7Uouq-4BtWaPQ&s=LtKQ-yfpvERysYJvdj3EP_VPA47BuNVkJ6hqfIW1RQM&e= > > > >_____________ > >The information contained in this message is proprietary and/or > confidential. If you are not the intended recipient, please: (i) delete the > message and all copies; (ii) do not disclose, distribute or use the message > in any manner; and (iii) notify the sender immediately. In addition, please > be aware that any message addressed to our domain is subject to archiving > and review by persons other than the intended recipient. Thank you. > > _____________ > The information contained in this message is proprietary and/or > confidential. If you are not the intended recipient, please: (i) delete the > message and all copies; (ii) do not disclose, distribute or use the message > in any manner; and (iii) notify the sender immediately. In addition, please > be aware that any message addressed to our domain is subject to archiving > and review by persons other than the intended recipient. Thank you. >
