lostluck commented on code in PR #25809: URL: https://github.com/apache/beam/pull/25809#discussion_r1141405566
########## sdks/go/pkg/beam/io/filesystem/file.go: ########## @@ -0,0 +1,44 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package filesystem + +import ( + "reflect" + + "github.com/apache/beam/sdks/v2/go/pkg/beam" +) + +func init() { + beam.RegisterType(reflect.TypeOf((*FileMetadata)(nil)).Elem()) +} + +// FileMetadata contains metadata about a file, namely its path and size in bytes. +type FileMetadata struct { Review Comment: Is the longer term goal to have FileMetadata manipulated by the filesystem abstractions? If not, we should just move this to the `fileio` package. It's not typical in Go to define a type that's not used within the package itself. Exported types and fields are a package's API, and in this case, it's not controlling anything here. Is there a pattern from Java or Python being replicated? ########## sdks/go/pkg/beam/io/fileio/match.go: ########## @@ -0,0 +1,174 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +// Package fileio provides transforms for matching and reading files. +package fileio + +import ( + "context" + "fmt" + "strings" + + "github.com/apache/beam/sdks/v2/go/pkg/beam" + "github.com/apache/beam/sdks/v2/go/pkg/beam/io/filesystem" + "github.com/apache/beam/sdks/v2/go/pkg/beam/register" +) + +func init() { + register.DoFn3x1[context.Context, string, func(filesystem.FileMetadata), error](&matchFn{}) + register.Emitter1[filesystem.FileMetadata]() +} + +// EmptyMatchTreatment controls how empty matches of a pattern are treated. +type EmptyMatchTreatment int + +const ( + // EmptyMatchTreatmentAllow allows empty matches. + EmptyMatchTreatmentAllow EmptyMatchTreatment = iota + // EmptyMatchTreatmentDisallow disallows empty matches. + EmptyMatchTreatmentDisallow + // EmptyMatchTreatmentAllowIfWildcard allows empty matches if the pattern contains a wildcard. + EmptyMatchTreatmentAllowIfWildcard +) + +type matchOption struct { + EmptyMatchTreatment EmptyMatchTreatment +} + +// MatchOptionFn is a function that can be passed to MatchFiles or MatchAll to configure options for +// matching files. +type MatchOptionFn func(*matchOption) error + +// WithEmptyMatchTreatment specifies how empty matches of a pattern should be treated. By default, +// empty matches are allowed if the pattern contains a wildcard. +func WithEmptyMatchTreatment(treatment EmptyMatchTreatment) MatchOptionFn { Review Comment: This applies to the ReadOptions and treatments as well. A user passes the option in like `fileio.WithEmptyMatchTreatment(fileio.EmptyMatchTreatmentAllowIfWildcard)` and is 73 characters by itself. That's very long, and repeats itself several times. And then we have all the options with field names matching the types directly. What if instead of having an exported enum consts we just have the set of helper functions directly? Having users instead call `fileio.EmptyMatchAllowedIfWildcard` remains as clear, avoids extra structural words like "With" and "Treatment", keeps a common prefix for keeping related options together in autocompletes. The underlying mechanism can still be managed with an enum and such, but then we don't expose unnecessary implementation details to users. This also reduces the odds of users passing in nonsense values for the enum, since when the `EmptyMatchTreatment` is exported, someone can cast that type. It's unlikely for this to happen, but why give them the option? Restricting the space of possibility makes APIs clearer to understand and use, since there are fewer opportunities for misuse. One could argue this makes it impossible for users to provide their own options... but the function that takes constant doesn't enable this, since a developer would still need to add new code to the package anyway. For end user packages, it's the end user's writing experience that needs to be optimized not the package's author (AKA us). And this just means adding 3 more lines (the function call), and moves the documentation directly anyway. --------- Two implementation options: Simple function pointers (AKA the function is the option func directly) ``` func EmptyMatchAllowedIfWildcard(mo *matchOption) error { o.EmptyMatchTreatment = emptyMatchTreatmentAllowIfWildcard } ``` or more similar to the current implementation, returning the option fn. ``` func EmptyMatchAllowedIfWildcard() MatchOptionFn { return func(o *matchOption) error { o.EmptyMatchTreatment = emptyMatchTreatmentAllowIfWildcard return nil } } ``` Or the same with a helper that's the same as the current `WithEmptyMatchTreatment`. ``` func EmptyMatchAllowedIfWildcard() MatchOptionFn { return withEmptyMatchTreatment(emptyMatchTreatmentAllowIfWildcard) return nil } } ``` Personally, I think in this case, the former is preferred since it's just unnecessary indirection. It avoids two characters. On the other hand, a gold standard in managing optional configuration is the `cmp` package, and it's tag along package `cmpopts`. Those defined their Options as an interface type with no exported methods (but one [internal filter method](https://github.com/google/go-cmp/blob/v0.5.9/cmp/options.go#L25)). So the lesson here is it's probably better to have users call a function that returns the option type, instead of it being direct, if only to better match user expectations around this sort of thing. So I'd say have users call functions to get the option func, if only to enable unforeseen changes in the future. Now I want to re-write the PAssert package with that in mind.... ---- Finally, since we end up hiding the enums for internal use only, we can reduce their names down. Eg. `emptyMatchTreatmentAllowIfWildcard` can be `emtAllowIfWildcard` (The type name can remain `emptyMatchTreatment`., and same for `directoryTreatment` with `dt` prefixes. Compression is the odd one out here (no good feelings on this), but if we can't think of a good reason for it not to follow the same pattern, we should be consistent with it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
