+1 Thanks for that Ismaël.
Regards JB On 06/02/2019 11:24, Ismaël Mejía wrote: > Since it seems we have consensus on deprecating both transforms I created > > BEAM-6605 Deprecate TextIO.readAll() and TextIO.ReadAll transform > BEAM-6606 Deprecate AvroIO.readAll() and AvroIO.ReadAll transform > > Thanks everyone. > > On Fri, Feb 1, 2019 at 7:03 PM Chamikara Jayalath <chamik...@google.com> > wrote: >> >> Python SDK doesn't have FileIO yet so let's keep ReadAllFromFoo transforms >> currently available for various file types around till we have that. >> >> Thanks, >> Cham >> >> On Fri, Feb 1, 2019 at 7:41 AM Jean-Baptiste Onofré <j...@nanthrax.net> >> wrote: >>> >>> Hi, >>> >>> readFiles() should be used IMHO. We should remove readAll() to avoid >>> confusion. >>> >>> Regards >>> JB >>> >>> On 30/01/2019 17:25, Ismaël Mejía wrote: >>>> Hello, >>>> >>>> A ‘recent’ pattern of use in Beam is to have in file based IOs a >>>> `readAll()` implementation that basically matches a `PCollection` of >>>> file patterns and reads them, e.g. `TextIO`, `AvroIO`. `ReadAll` is >>>> implemented by a expand function that matches files with FileIO and >>>> then reads them using a format specific `ReadFiles` transform e.g. >>>> TextIO.ReadFiles, AvroIO.ReadFiles. So in the end `ReadAll` in the >>>> Java implementation is just an user friendly API to hide FileIO.match >>>> + ReadFiles. >>>> >>>> Most recent IOs do NOT implement ReadAll to encourage the more >>>> composable approach of File + ReadFiles, e.g. XmlIO and ParquetIO. >>>> >>>> Implementing ReadAll as a wrapper is relatively easy and is definitely >>>> user friendly, but it has an issue, it may be error-prone and it adds >>>> more code to maintain (mostly ‘repeated’ code). However `readAll` is a >>>> more abstract pattern that applies not only to File based IOs so it >>>> makes sense for example in other transforms that map a `Pcollection` >>>> of read requests and is the basis for SDF composable style APIs like >>>> the recent `HBaseIO.readAll()`. >>>> >>>> So the question is should we: >>>> >>>> [1] Implement `readAll` in all file based IOs to be user friendly and >>>> assume the (minor) maintenance cost >>>> >>>> or >>>> >>>> [2] Deprecate `readAll` from file based IOs and encourage users to use >>>> FileIO + `readFiles` (less maintenance and encourage composition). >>>> >>>> I just checked quickly in the python code base but I did not find if >>>> the File match + ReadFiles pattern applies, but it would be nice to >>>> see what the python guys think on this too. >>>> >>>> This discussion comes from a recent slack conversation with Łukasz >>>> Gajowy, and we wanted to settle into one approach to make the IO >>>> signatures consistent, so any opinions/preferences? >>>> >>> >>> -- >>> Jean-Baptiste Onofré >>> jbono...@apache.org >>> http://blog.nanthrax.net >>> Talend - http://www.talend.com -- Jean-Baptiste Onofré jbono...@apache.org http://blog.nanthrax.net Talend - http://www.talend.com