+1

Thanks for that Ismaël.

Regards
JB

On 06/02/2019 11:24, Ismaël Mejía wrote:
> Since it seems we have consensus on deprecating both transforms I created
> 
> BEAM-6605 Deprecate TextIO.readAll() and TextIO.ReadAll transform
> BEAM-6606 Deprecate AvroIO.readAll() and AvroIO.ReadAll transform
> 
> Thanks everyone.
> 
> On Fri, Feb 1, 2019 at 7:03 PM Chamikara Jayalath <chamik...@google.com> 
> wrote:
>>
>> Python SDK doesn't have FileIO yet so let's keep ReadAllFromFoo transforms 
>> currently available for various file types around till we have that.
>>
>> Thanks,
>> Cham
>>
>> On Fri, Feb 1, 2019 at 7:41 AM Jean-Baptiste Onofré <j...@nanthrax.net> 
>> wrote:
>>>
>>> Hi,
>>>
>>> readFiles() should be used IMHO. We should remove readAll() to avoid
>>> confusion.
>>>
>>> Regards
>>> JB
>>>
>>> On 30/01/2019 17:25, Ismaël Mejía wrote:
>>>> Hello,
>>>>
>>>> A ‘recent’ pattern of use in Beam is to have in file based IOs a
>>>> `readAll()` implementation that basically matches a `PCollection` of
>>>> file patterns and reads them, e.g. `TextIO`, `AvroIO`. `ReadAll` is
>>>> implemented by a expand function that matches files with FileIO and
>>>> then reads them using a format specific `ReadFiles` transform e.g.
>>>> TextIO.ReadFiles, AvroIO.ReadFiles. So in the end `ReadAll` in the
>>>> Java implementation is just an user friendly API to hide FileIO.match
>>>> + ReadFiles.
>>>>
>>>> Most recent IOs do NOT implement ReadAll to encourage the more
>>>> composable approach of File + ReadFiles, e.g. XmlIO and ParquetIO.
>>>>
>>>> Implementing ReadAll as a wrapper is relatively easy and is definitely
>>>> user friendly, but it has an  issue, it may be error-prone and it adds
>>>> more code to maintain (mostly ‘repeated’ code). However `readAll` is a
>>>> more abstract pattern that applies not only to File based IOs so it
>>>> makes sense for example in other transforms that map a `Pcollection`
>>>> of read requests and is the basis for SDF composable style APIs like
>>>> the recent `HBaseIO.readAll()`.
>>>>
>>>> So the question is should we:
>>>>
>>>> [1] Implement `readAll` in all file based IOs to be user friendly and
>>>> assume the (minor) maintenance cost
>>>>
>>>> or
>>>>
>>>> [2] Deprecate `readAll` from file based IOs and encourage users to use
>>>> FileIO + `readFiles` (less maintenance and encourage composition).
>>>>
>>>> I just checked quickly in the python code base but I did not find if
>>>> the File match + ReadFiles pattern applies, but it would be nice to
>>>> see what the python guys think on this too.
>>>>
>>>> This discussion comes from a recent slack conversation with Łukasz
>>>> Gajowy, and we wanted to settle into one approach to make the IO
>>>> signatures consistent, so any opinions/preferences?
>>>>
>>>
>>> --
>>> Jean-Baptiste Onofré
>>> jbono...@apache.org
>>> http://blog.nanthrax.net
>>> Talend - http://www.talend.com

-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Reply via email to