Hi, readFiles() should be used IMHO. We should remove readAll() to avoid confusion.
Regards JB On 30/01/2019 17:25, Ismaël Mejía wrote: > Hello, > > A ‘recent’ pattern of use in Beam is to have in file based IOs a > `readAll()` implementation that basically matches a `PCollection` of > file patterns and reads them, e.g. `TextIO`, `AvroIO`. `ReadAll` is > implemented by a expand function that matches files with FileIO and > then reads them using a format specific `ReadFiles` transform e.g. > TextIO.ReadFiles, AvroIO.ReadFiles. So in the end `ReadAll` in the > Java implementation is just an user friendly API to hide FileIO.match > + ReadFiles. > > Most recent IOs do NOT implement ReadAll to encourage the more > composable approach of File + ReadFiles, e.g. XmlIO and ParquetIO. > > Implementing ReadAll as a wrapper is relatively easy and is definitely > user friendly, but it has an issue, it may be error-prone and it adds > more code to maintain (mostly ‘repeated’ code). However `readAll` is a > more abstract pattern that applies not only to File based IOs so it > makes sense for example in other transforms that map a `Pcollection` > of read requests and is the basis for SDF composable style APIs like > the recent `HBaseIO.readAll()`. > > So the question is should we: > > [1] Implement `readAll` in all file based IOs to be user friendly and > assume the (minor) maintenance cost > > or > > [2] Deprecate `readAll` from file based IOs and encourage users to use > FileIO + `readFiles` (less maintenance and encourage composition). > > I just checked quickly in the python code base but I did not find if > the File match + ReadFiles pattern applies, but it would be nice to > see what the python guys think on this too. > > This discussion comes from a recent slack conversation with Łukasz > Gajowy, and we wanted to settle into one approach to make the IO > signatures consistent, so any opinions/preferences? > -- Jean-Baptiste Onofré jbono...@apache.org http://blog.nanthrax.net Talend - http://www.talend.com