I looked at the possibility to fix the underlying filesystem and it turns out that only the local filesystem couldn't handle decoding right, HDFS and some other filesystem, e.g. S3, already have a check for that. So I added a similar check to the local filesystem too. The implementation is in the same pull request https://github.com/apache/beam/pull/12050.
Can you take a look at it, please? Thanks, David út 11. 8. 2020 v 19:39 odesílatel Luke Cwik <[email protected]> napsal: > The filesystem "fixes" all surmount to removing the "isDirectory" boolean > bit and encoding whether something is a directory in the string part of the > resource specification which also turns out to be backwards incompatible > (just in a different way). > > Removing the "directory" bit would be great and that would allow us to use > strings instead of resource ids but would require filesystems to perform > the mapping from some standard path specification to their internal > representation. > > On Wed, Aug 5, 2020 at 9:26 PM Chamikara Jayalath <[email protected]> > wrote: > >> So, based on the comments in the PR, the underlying issue seems to be >> 'FileBasedSink.convertToFileResourceIfPossible(stringCoder.decode(inStream));' >> not returning the correct result, right ? >> If so I think the correct fix might be your proposal (2) - Try to fix the >> underlying filesystem to do a better job of file/dir matching >> >> This is a bug we probably have to fix anyways for the local filesystem >> and/or HDFS and this will also give us a solution that does not break >> update compatibility. >> >> Thanks, >> Cham >> >> On Wed, Aug 5, 2020 at 3:41 PM Luke Cwik <[email protected]> wrote: >> >>> Cham, that was one of the options I had mentioned on the PR. The >>> difference here is that this is a bug fix and existing users could be >>> broken unknowingly so it might be worthwhile to take that breaking change >>> (and possibly provide users a way to perform an upgrade using the old >>> implementation). >>> >>> >>> On Wed, Aug 5, 2020 at 3:33 PM Chamikara Jayalath <[email protected]> >>> wrote: >>> >>>> This might break the update compatibility for Dataflow streaming >>>> pipelines. +Reuven Lax <[email protected]> +Lukasz Cwik >>>> <[email protected]> >>>> >>>> In other cases, to save update compatibility, we introduced a user >>>> option that changes the coder only when the user explicitly asks for an >>>> updated feature that requires the new coder. For example, >>>> https://github.com/apache/beam/commit/304882caa89afe24150062b959ee915c79e72ab3 >>>> >>>> Thanks, >>>> Cham >>>> >>>> >>>> On Mon, Aug 3, 2020 at 10:00 AM David Janíček <[email protected]> >>>> wrote: >>>> >>>>> Hello everyone, >>>>> >>>>> I've reported an issue >>>>> https://issues.apache.org/jira/browse/BEAM-10292 which is about >>>>> broken DefaultFilenamePolicy.ParamsCoder behavior. >>>>> DefaultFilenamePolicy.ParamsCoder loses information whether >>>>> DefaultFilenamePolicy.Params's baseFilename resource is file or directory >>>>> on some filesystems, at least on local FS and HDFS. >>>>> >>>>> After discussion with @dmvk and @lukecwik, we have agreed that the >>>>> best solution could be to take the breaking change and use ResourceIdCoder >>>>> for encoding/decoding DefaultFilenamePolicy.Params's baseFilename, this >>>>> way >>>>> the file/directory information is preserved. >>>>> The solution is implemented in pull request >>>>> https://github.com/apache/beam/pull/12050. >>>>> >>>>> I'd like to ask if there is a consensus on this breaking change. Is >>>>> everyone OK with this? >>>>> Thanks in advance for answers. >>>>> >>>>> Best regards, >>>>> David >>>>> >>>> -- S pozdravem David Janíček
