lostluck commented on code in PR #25809:
URL: https://github.com/apache/beam/pull/25809#discussion_r1141405566


##########
sdks/go/pkg/beam/io/filesystem/file.go:
##########
@@ -0,0 +1,44 @@
+// Licensed to the Apache Software Foundation (ASF) under one or more
+// contributor license agreements.  See the NOTICE file distributed with
+// this work for additional information regarding copyright ownership.
+// The ASF licenses this file to You under the Apache License, Version 2.0
+// (the "License"); you may not use this file except in compliance with
+// the License.  You may obtain a copy of the License at
+//
+//    http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package filesystem
+
+import (
+       "reflect"
+
+       "github.com/apache/beam/sdks/v2/go/pkg/beam"
+)
+
+func init() {
+       beam.RegisterType(reflect.TypeOf((*FileMetadata)(nil)).Elem())
+}
+
+// FileMetadata contains metadata about a file, namely its path and size in 
bytes.
+type FileMetadata struct {

Review Comment:
   Is the longer term goal to have FileMetadata manipulated by the filesystem 
abstractions? If not, we should just move this to the `fileio` package. It's 
not typical in Go to define a type that's not used within the package itself. 
Exported types and fields are a package's API, and in this case, it's not 
controlling anything here.
   
   Is there a pattern from Java or Python being replicated?



##########
sdks/go/pkg/beam/io/fileio/match.go:
##########
@@ -0,0 +1,174 @@
+// Licensed to the Apache Software Foundation (ASF) under one or more
+// contributor license agreements.  See the NOTICE file distributed with
+// this work for additional information regarding copyright ownership.
+// The ASF licenses this file to You under the Apache License, Version 2.0
+// (the "License"); you may not use this file except in compliance with
+// the License.  You may obtain a copy of the License at
+//
+//    http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+// Package fileio provides transforms for matching and reading files.
+package fileio
+
+import (
+       "context"
+       "fmt"
+       "strings"
+
+       "github.com/apache/beam/sdks/v2/go/pkg/beam"
+       "github.com/apache/beam/sdks/v2/go/pkg/beam/io/filesystem"
+       "github.com/apache/beam/sdks/v2/go/pkg/beam/register"
+)
+
+func init() {
+       register.DoFn3x1[context.Context, string, 
func(filesystem.FileMetadata), error](&matchFn{})
+       register.Emitter1[filesystem.FileMetadata]()
+}
+
+// EmptyMatchTreatment controls how empty matches of a pattern are treated.
+type EmptyMatchTreatment int
+
+const (
+       // EmptyMatchTreatmentAllow allows empty matches.
+       EmptyMatchTreatmentAllow EmptyMatchTreatment = iota
+       // EmptyMatchTreatmentDisallow disallows empty matches.
+       EmptyMatchTreatmentDisallow
+       // EmptyMatchTreatmentAllowIfWildcard allows empty matches if the 
pattern contains a wildcard.
+       EmptyMatchTreatmentAllowIfWildcard
+)
+
+type matchOption struct {
+       EmptyMatchTreatment EmptyMatchTreatment
+}
+
+// MatchOptionFn is a function that can be passed to MatchFiles or MatchAll to 
configure options for
+// matching files.
+type MatchOptionFn func(*matchOption) error
+
+// WithEmptyMatchTreatment specifies how empty matches of a pattern should be 
treated. By default,
+// empty matches are allowed if the pattern contains a wildcard.
+func WithEmptyMatchTreatment(treatment EmptyMatchTreatment) MatchOptionFn {

Review Comment:
   This applies to the ReadOptions and treatments as well.
   
   A user passes the option in like 
`fileio.WithEmptyMatchTreatment(fileio.EmptyMatchTreatmentAllowIfWildcard)` and 
is 73 characters by itself. That's very long, and repeats itself several times.
   
   And then we have all the options with field names matching the types 
directly. What if instead of having an exported enum consts we just have the 
set of helper functions directly?
   
   Having users instead call `fileio.EmptyMatchAllowedIfWildcard` remains as 
clear, avoids extra structural words like "With" and "Treatment", keeps a 
common prefix for keeping related options together in autocompletes.
   
   The underlying mechanism can still be managed with an enum and such, but 
then we don't expose unnecessary implementation details to users. This also 
reduces the odds of users passing in nonsense values for the enum, since when 
the `EmptyMatchTreatment` is exported, someone can cast that type. It's 
unlikely for this to happen, but why give them the option? Restricting the 
space of possibility makes APIs clearer to understand and use, since there are 
fewer opportunities for misuse.
   
   One could argue this makes it impossible for users to provide their own 
options... but the function that takes constant doesn't enable this, since a 
developer would still need to add new code to the package anyway. For end user 
packages, it's the end user's writing experience that needs to be optimized not 
the package's author (AKA us). And this just means adding 3 more lines (the 
function call), and moves the documentation directly anyway.
   
   ---------
   
   Two implementation options: Simple function pointers (AKA the function is 
the option func directly)
   
   ```
   func EmptyMatchAllowedIfWildcard(mo *matchOption) error {
      o.EmptyMatchTreatment = emptyMatchTreatmentAllowIfWildcard
   }
   ```
   
   or more similar to the current implementation, returning the option fn.
   
   ```
   func EmptyMatchAllowedIfWildcard() MatchOptionFn {
        return func(o *matchOption) error {
                o.EmptyMatchTreatment = emptyMatchTreatmentAllowIfWildcard
                return nil
        }
   }
   ```
   
   Or the same with a helper that's the same as the current 
`WithEmptyMatchTreatment`.
   
   ```
   func EmptyMatchAllowedIfWildcard() MatchOptionFn {
        return withEmptyMatchTreatment(emptyMatchTreatmentAllowIfWildcard)
                return nil
        }
   }
   ```
   
   
   Personally, I think in this case, the former is preferred since it's just 
unnecessary indirection. It avoids two characters. 
   
   On the other hand, a gold standard in managing optional configuration is the 
`cmp` package, and it's tag along package `cmpopts`. Those defined their 
Options as an interface type with no exported methods (but one [internal filter 
method](https://github.com/google/go-cmp/blob/v0.5.9/cmp/options.go#L25)). So 
the lesson here is it's probably better to have users call a function that 
returns the option type, instead of it being direct, if only to better match 
user expectations around this sort of thing.
   
   So I'd say have users call functions to get the option func, if only to 
enable unforeseen changes in the future.
   
   Now I want to re-write the PAssert package with that in mind.... 
   
   ----
   
   Finally, since we end up hiding the enums for internal use only, we can 
reduce their names down. Eg. `emptyMatchTreatmentAllowIfWildcard` can be 
`emtAllowIfWildcard` (The type name can remain `emptyMatchTreatment`., and same 
for `directoryTreatment` with `dt` prefixes.
   
   Compression is the odd one out here (no good feelings on this), but if we 
can't think of a good reason for it not to follow the same pattern, we should 
be consistent with it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to