I took a look at the PR and suggested some changes. On Tue, Aug 18, 2020 at 8:16 AM David Janíček <[email protected]> wrote:
> I looked at the possibility to fix the underlying filesystem and it turns > out that only the local filesystem couldn't handle decoding right, HDFS and > some other filesystem, e.g. S3, already have a check for that. > So I added a similar check to the local filesystem too. The implementation > is in the same pull request https://github.com/apache/beam/pull/12050. > > Can you take a look at it, please? > > Thanks, > David > > út 11. 8. 2020 v 19:39 odesílatel Luke Cwik <[email protected]> napsal: > >> The filesystem "fixes" all surmount to removing the "isDirectory" boolean >> bit and encoding whether something is a directory in the string part of the >> resource specification which also turns out to be backwards incompatible >> (just in a different way). >> >> Removing the "directory" bit would be great and that would allow us to >> use strings instead of resource ids but would require filesystems to >> perform the mapping from some standard path specification to their internal >> representation. >> >> On Wed, Aug 5, 2020 at 9:26 PM Chamikara Jayalath <[email protected]> >> wrote: >> >>> So, based on the comments in the PR, the underlying issue seems to be >>> 'FileBasedSink.convertToFileResourceIfPossible(stringCoder.decode(inStream));' >>> not returning the correct result, right ? >>> If so I think the correct fix might be your proposal (2) - Try to fix >>> the underlying filesystem to do a better job of file/dir matching >>> >>> This is a bug we probably have to fix anyways for the local filesystem >>> and/or HDFS and this will also give us a solution that does not break >>> update compatibility. >>> >>> Thanks, >>> Cham >>> >>> On Wed, Aug 5, 2020 at 3:41 PM Luke Cwik <[email protected]> wrote: >>> >>>> Cham, that was one of the options I had mentioned on the PR. The >>>> difference here is that this is a bug fix and existing users could be >>>> broken unknowingly so it might be worthwhile to take that breaking change >>>> (and possibly provide users a way to perform an upgrade using the old >>>> implementation). >>>> >>>> >>>> On Wed, Aug 5, 2020 at 3:33 PM Chamikara Jayalath <[email protected]> >>>> wrote: >>>> >>>>> This might break the update compatibility for Dataflow streaming >>>>> pipelines. +Reuven Lax <[email protected]> +Lukasz Cwik >>>>> <[email protected]> >>>>> >>>>> In other cases, to save update compatibility, we introduced a user >>>>> option that changes the coder only when the user explicitly asks for an >>>>> updated feature that requires the new coder. For example, >>>>> https://github.com/apache/beam/commit/304882caa89afe24150062b959ee915c79e72ab3 >>>>> >>>>> Thanks, >>>>> Cham >>>>> >>>>> >>>>> On Mon, Aug 3, 2020 at 10:00 AM David Janíček <[email protected]> >>>>> wrote: >>>>> >>>>>> Hello everyone, >>>>>> >>>>>> I've reported an issue >>>>>> https://issues.apache.org/jira/browse/BEAM-10292 which is about >>>>>> broken DefaultFilenamePolicy.ParamsCoder behavior. >>>>>> DefaultFilenamePolicy.ParamsCoder loses information whether >>>>>> DefaultFilenamePolicy.Params's baseFilename resource is file or directory >>>>>> on some filesystems, at least on local FS and HDFS. >>>>>> >>>>>> After discussion with @dmvk and @lukecwik, we have agreed that the >>>>>> best solution could be to take the breaking change and use >>>>>> ResourceIdCoder >>>>>> for encoding/decoding DefaultFilenamePolicy.Params's baseFilename, this >>>>>> way >>>>>> the file/directory information is preserved. >>>>>> The solution is implemented in pull request >>>>>> https://github.com/apache/beam/pull/12050. >>>>>> >>>>>> I'd like to ask if there is a consensus on this breaking change. Is >>>>>> everyone OK with this? >>>>>> Thanks in advance for answers. >>>>>> >>>>>> Best regards, >>>>>> David >>>>>> >>>>> > > -- > S pozdravem > David Janíček >
