I took a look at the PR and suggested some changes.

On Tue, Aug 18, 2020 at 8:16 AM David Janíček <[email protected]>
wrote:

> I looked at the possibility to fix the underlying filesystem and it turns
> out that only the local filesystem couldn't handle decoding right, HDFS and
> some other filesystem, e.g. S3, already have a check for that.
> So I added a similar check to the local filesystem too. The implementation
> is in the same pull request https://github.com/apache/beam/pull/12050.
>
> Can you take a look at it, please?
>
> Thanks,
> David
>
> út 11. 8. 2020 v 19:39 odesílatel Luke Cwik <[email protected]> napsal:
>
>> The filesystem "fixes" all surmount to removing the "isDirectory" boolean
>> bit and encoding whether something is a directory in the string part of the
>> resource specification which also turns out to be backwards incompatible
>> (just in a different way).
>>
>> Removing the "directory" bit would be great and that would allow us to
>> use strings instead of resource ids but would require filesystems to
>> perform the mapping from some standard path specification to their internal
>> representation.
>>
>> On Wed, Aug 5, 2020 at 9:26 PM Chamikara Jayalath <[email protected]>
>> wrote:
>>
>>> So, based on the comments in the PR, the underlying issue seems to be
>>> 'FileBasedSink.convertToFileResourceIfPossible(stringCoder.decode(inStream));'
>>> not returning the correct result, right ?
>>> If so I think the correct fix might be your proposal (2) - Try to fix
>>> the underlying filesystem to do a better job of file/dir matching
>>>
>>> This is a bug we probably have to fix anyways for the local filesystem
>>> and/or HDFS and this will also give us a solution that does not break
>>> update compatibility.
>>>
>>> Thanks,
>>> Cham
>>>
>>> On Wed, Aug 5, 2020 at 3:41 PM Luke Cwik <[email protected]> wrote:
>>>
>>>> Cham, that was one of the options I had mentioned on the PR. The
>>>> difference here is that this is a bug fix and existing users could be
>>>> broken unknowingly so it might be worthwhile to take that breaking change
>>>> (and possibly provide users a way to perform an upgrade using the old
>>>> implementation).
>>>>
>>>>
>>>> On Wed, Aug 5, 2020 at 3:33 PM Chamikara Jayalath <[email protected]>
>>>> wrote:
>>>>
>>>>> This might break the update compatibility for Dataflow streaming
>>>>> pipelines. +Reuven Lax <[email protected]>  +Lukasz Cwik
>>>>> <[email protected]>
>>>>>
>>>>> In other cases, to save update compatibility, we introduced a user
>>>>> option that changes the coder only when the user explicitly asks for an
>>>>> updated feature that requires the new coder. For example,
>>>>> https://github.com/apache/beam/commit/304882caa89afe24150062b959ee915c79e72ab3
>>>>>
>>>>> Thanks,
>>>>> Cham
>>>>>
>>>>>
>>>>> On Mon, Aug 3, 2020 at 10:00 AM David Janíček <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hello everyone,
>>>>>>
>>>>>> I've reported an issue
>>>>>> https://issues.apache.org/jira/browse/BEAM-10292 which is about
>>>>>> broken DefaultFilenamePolicy.ParamsCoder behavior.
>>>>>> DefaultFilenamePolicy.ParamsCoder loses information whether
>>>>>> DefaultFilenamePolicy.Params's baseFilename resource is file or directory
>>>>>> on some filesystems, at least on local FS and HDFS.
>>>>>>
>>>>>> After discussion with @dmvk and @lukecwik, we have agreed that the
>>>>>> best solution could be to take the breaking change and use 
>>>>>> ResourceIdCoder
>>>>>> for encoding/decoding DefaultFilenamePolicy.Params's baseFilename, this 
>>>>>> way
>>>>>> the file/directory information is preserved.
>>>>>> The solution is implemented in pull request
>>>>>> https://github.com/apache/beam/pull/12050.
>>>>>>
>>>>>> I'd like to ask if there is a consensus on this breaking change. Is
>>>>>> everyone OK with this?
>>>>>> Thanks in advance for answers.
>>>>>>
>>>>>> Best regards,
>>>>>> David
>>>>>>
>>>>>
>
> --
> S pozdravem
> David Janíček
>

Reply via email to