[
https://issues.apache.org/jira/browse/BEAM-9316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17037332#comment-17037332
]
Eugene Kirpichov commented on BEAM-9316:
----------------------------------------
Could you elaborate why the current behavior is undesirable? Do I understand
correctly that in the given example it'll write to
gs://some/bucket/some/directory/ ? If so, that seems fairly intuitive to me and
arguably a useful thing to have: writing to subdirectories within a base
directory specified with .to().
I think making relativeFileNaming private would be unfortunate: it's a
backwards incompatible change (can break people who use it), and it seems
useful to have something of that sort.
As for checking whether a fileNaming is relative - I don't think it can be done
in general (what if somebody writes a new FileNaming that, logically, is
relative, but isn't constructed via relativeFileNaming). I guess you could
throw runtime errors if .to() is specified and the FileNaming *returns*
something that contains a slash? But again I'm not sure that's desirable.
I acknowledge that there is some confusion but I think the best way to address
it would be to rename things more clearly (and deprecate the old names). E.g.
maybe something like: to() -> toBaseDirectory() and relativeFileNaming() ->
toSubDirectory()? There's probably better options too.
> FileIO.Write.relativeFileNaming should not be public
> ----------------------------------------------------
>
> Key: BEAM-9316
> URL: https://issues.apache.org/jira/browse/BEAM-9316
> Project: Beam
> Issue Type: Improvement
> Components: io-java-files
> Reporter: Claire McGinty
> Priority: Major
>
> I think the existing FileIO.writeDynamic is a bit easy to misuse, as
> something like this looks correct, and compiles:
>
> {{ FileIO.writeDynamic()}}
> {{ .by(...)}}
> {{ .withNaming(new SerializableFunction[String, FileNaming] {}}
> {{ override def apply(str: String): FileNaming =}}
> {{ FileIO.Write.relativeFileNaming(}}
> {{ "some/directory",}}
> {{ new FileNaming {}}
> {{ override defFilename(window: BoundedWindow, pane: PaneInfo,
> numShards: Int, shardIndex: Int, compression: Compression): String =
> "some_filename.txt"}}{{}}}
> {{ .via(...)}}
> {{ .to("gs://some/bucket")}}
>
> However, for dynamic writes, if `outputDirectory` (.to("...")) is set, under
> the hood, Beam will wrap the provided `fileNamingFn` in
> `FileIO.Write.relativeFileNaming(...)` as well, so it ends up as a nested
> `relativeFileNaming` function.
> ([https://github.com/apache/beam/blob/da9e17288e8473925674a4691d9e86252e67d7d7/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileIO.java#L1243)|https://github.com/apache/beam/blob/da9e17288e8473925674a4691d9e86252e67d7d7/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileIO.java#L1243])]
>
> IMO, `relativeFileNaming` should either be made private, so that it's only
> used internally by FileIO.Write, or a precondition should be added when a
> dynamic FileIO.Write is expanded, to check that `outputDirectory` can't be
> set if the provided `fileNamingFn` is relative.
>
> wdyt?
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)