Robert Burke created BEAM-12664:
-----------------------------------

             Summary: Improve textio: Write sharding
                 Key: BEAM-12664
                 URL: https://issues.apache.org/jira/browse/BEAM-12664
             Project: Beam
          Issue Type: Improvement
          Components: sdk-go
            Reporter: Robert Burke


The other SDKs have implementations that shard files on write. So should the Go 
SDK. The feature is mentioned in the Beam Programming Guide:

[https://beam.apache.org/documentation/programming-guide/#file-based-writing-multiple-files]

It would be expedient to provide an Xlang TextIO implementation for the Go SDK 
compared to replicating the implementation in Go, at cost of some execution 
time performance.

Ideally it would be similarly generalized to simplify writing File Sinks.  File 
sinks are necessarily complex to provide a robust and reliable implementation

Current Go implementation.

[https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/io/textio/textio.go#L119]

Python FileIO implementation:

[https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/filebasedsink.py]
 

(Note iobase.Sink is deprecated, but is still suitable for file io.)

Java TextIO & FileIO:

[https://github.com/apache/beam/blob/f8fbbfa309ac88848057de694d4cc1cba3eaa92a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextIO.java#L1259]
 

[https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileIO.java]
 

 

KafkaIO (example of writing Go SDK side wrapper for a xlang Java IO):

[https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/io/xlang/kafkaio/kafka.go]
 

 

General docs on writing sinks: 
[https://beam.apache.org/documentation/io/developing-io-overview/#sinks] 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to