So, based on the comments in the PR, the underlying issue seems to be 'FileBasedSink.convertToFileResourceIfPossible(stringCoder.decode(inStream));' not returning the correct result, right ? If so I think the correct fix might be your proposal (2) - Try to fix the underlying filesystem to do a better job of file/dir matching
This is a bug we probably have to fix anyways for the local filesystem and/or HDFS and this will also give us a solution that does not break update compatibility. Thanks, Cham On Wed, Aug 5, 2020 at 3:41 PM Luke Cwik <[email protected]> wrote: > Cham, that was one of the options I had mentioned on the PR. The > difference here is that this is a bug fix and existing users could be > broken unknowingly so it might be worthwhile to take that breaking change > (and possibly provide users a way to perform an upgrade using the old > implementation). > > > On Wed, Aug 5, 2020 at 3:33 PM Chamikara Jayalath <[email protected]> > wrote: > >> This might break the update compatibility for Dataflow streaming >> pipelines. +Reuven Lax <[email protected]> +Lukasz Cwik >> <[email protected]> >> >> In other cases, to save update compatibility, we introduced a user option >> that changes the coder only when the user explicitly asks for an updated >> feature that requires the new coder. For example, >> https://github.com/apache/beam/commit/304882caa89afe24150062b959ee915c79e72ab3 >> >> Thanks, >> Cham >> >> >> On Mon, Aug 3, 2020 at 10:00 AM David Janíček <[email protected]> >> wrote: >> >>> Hello everyone, >>> >>> I've reported an issue https://issues.apache.org/jira/browse/BEAM-10292 >>> which is about broken DefaultFilenamePolicy.ParamsCoder behavior. >>> DefaultFilenamePolicy.ParamsCoder loses information whether >>> DefaultFilenamePolicy.Params's baseFilename resource is file or directory >>> on some filesystems, at least on local FS and HDFS. >>> >>> After discussion with @dmvk and @lukecwik, we have agreed that the best >>> solution could be to take the breaking change and use ResourceIdCoder for >>> encoding/decoding DefaultFilenamePolicy.Params's baseFilename, this way the >>> file/directory information is preserved. >>> The solution is implemented in pull request >>> https://github.com/apache/beam/pull/12050. >>> >>> I'd like to ask if there is a consensus on this breaking change. Is >>> everyone OK with this? >>> Thanks in advance for answers. >>> >>> Best regards, >>> David >>> >>
