Robert Burke created BEAM-12664:
-----------------------------------
Summary: Improve textio: Write sharding
Key: BEAM-12664
URL: https://issues.apache.org/jira/browse/BEAM-12664
Project: Beam
Issue Type: Improvement
Components: sdk-go
Reporter: Robert Burke
The other SDKs have implementations that shard files on write. So should the Go
SDK. The feature is mentioned in the Beam Programming Guide:
[https://beam.apache.org/documentation/programming-guide/#file-based-writing-multiple-files]
It would be expedient to provide an Xlang TextIO implementation for the Go SDK
compared to replicating the implementation in Go, at cost of some execution
time performance.
Ideally it would be similarly generalized to simplify writing File Sinks. File
sinks are necessarily complex to provide a robust and reliable implementation
Current Go implementation.
[https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/io/textio/textio.go#L119]
Python FileIO implementation:
[https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/filebasedsink.py]
(Note iobase.Sink is deprecated, but is still suitable for file io.)
Java TextIO & FileIO:
[https://github.com/apache/beam/blob/f8fbbfa309ac88848057de694d4cc1cba3eaa92a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextIO.java#L1259]
[https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileIO.java]
KafkaIO (example of writing Go SDK side wrapper for a xlang Java IO):
[https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/io/xlang/kafkaio/kafka.go]
General docs on writing sinks:
[https://beam.apache.org/documentation/io/developing-io-overview/#sinks]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)