Since it seems we have consensus on deprecating both transforms I created BEAM-6605 Deprecate TextIO.readAll() and TextIO.ReadAll transform BEAM-6606 Deprecate AvroIO.readAll() and AvroIO.ReadAll transform
Thanks everyone. On Fri, Feb 1, 2019 at 7:03 PM Chamikara Jayalath <chamik...@google.com> wrote: > > Python SDK doesn't have FileIO yet so let's keep ReadAllFromFoo transforms > currently available for various file types around till we have that. > > Thanks, > Cham > > On Fri, Feb 1, 2019 at 7:41 AM Jean-Baptiste Onofré <j...@nanthrax.net> wrote: >> >> Hi, >> >> readFiles() should be used IMHO. We should remove readAll() to avoid >> confusion. >> >> Regards >> JB >> >> On 30/01/2019 17:25, Ismaël Mejía wrote: >> > Hello, >> > >> > A ‘recent’ pattern of use in Beam is to have in file based IOs a >> > `readAll()` implementation that basically matches a `PCollection` of >> > file patterns and reads them, e.g. `TextIO`, `AvroIO`. `ReadAll` is >> > implemented by a expand function that matches files with FileIO and >> > then reads them using a format specific `ReadFiles` transform e.g. >> > TextIO.ReadFiles, AvroIO.ReadFiles. So in the end `ReadAll` in the >> > Java implementation is just an user friendly API to hide FileIO.match >> > + ReadFiles. >> > >> > Most recent IOs do NOT implement ReadAll to encourage the more >> > composable approach of File + ReadFiles, e.g. XmlIO and ParquetIO. >> > >> > Implementing ReadAll as a wrapper is relatively easy and is definitely >> > user friendly, but it has an issue, it may be error-prone and it adds >> > more code to maintain (mostly ‘repeated’ code). However `readAll` is a >> > more abstract pattern that applies not only to File based IOs so it >> > makes sense for example in other transforms that map a `Pcollection` >> > of read requests and is the basis for SDF composable style APIs like >> > the recent `HBaseIO.readAll()`. >> > >> > So the question is should we: >> > >> > [1] Implement `readAll` in all file based IOs to be user friendly and >> > assume the (minor) maintenance cost >> > >> > or >> > >> > [2] Deprecate `readAll` from file based IOs and encourage users to use >> > FileIO + `readFiles` (less maintenance and encourage composition). >> > >> > I just checked quickly in the python code base but I did not find if >> > the File match + ReadFiles pattern applies, but it would be nice to >> > see what the python guys think on this too. >> > >> > This discussion comes from a recent slack conversation with Łukasz >> > Gajowy, and we wanted to settle into one approach to make the IO >> > signatures consistent, so any opinions/preferences? >> > >> >> -- >> Jean-Baptiste Onofré >> jbono...@apache.org >> http://blog.nanthrax.net >> Talend - http://www.talend.com