[ 
https://issues.apache.org/jira/browse/BEAM-12664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17549435#comment-17549435
 ] 

Danny McCormick commented on BEAM-12664:
----------------------------------------

This issue has been migrated to https://github.com/apache/beam/issues/21031

> Improve textio: Write sharding
> ------------------------------
>
>                 Key: BEAM-12664
>                 URL: https://issues.apache.org/jira/browse/BEAM-12664
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-go
>            Reporter: Robert Burke
>            Priority: P3
>
> The other SDKs have implementations that shard files on write. So should the 
> Go SDK. The feature is mentioned in the Beam Programming Guide:
> [https://beam.apache.org/documentation/programming-guide/#file-based-writing-multiple-files]
> It would be expedient to provide an Xlang TextIO implementation for the Go 
> SDK compared to replicating the implementation in Go, at cost of some 
> execution time performance.
> Ideally it would be similarly generalized to simplify writing File Sinks.  
> File sinks are necessarily complex to provide a robust and reliable 
> implementation
> Current Go implementation.
> [https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/io/textio/textio.go#L119]
> Python FileIO implementation:
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/filebasedsink.py]
>  
> (Note iobase.Sink is deprecated, but is still suitable for file io.)
> Java TextIO & FileIO:
> [https://github.com/apache/beam/blob/f8fbbfa309ac88848057de694d4cc1cba3eaa92a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextIO.java#L1259]
>  
> [https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileIO.java]
>  
>  
> KafkaIO (example of writing Go SDK side wrapper for a xlang Java IO):
> [https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/io/xlang/kafkaio/kafka.go]
>  
>  
> General docs on writing sinks: 
> [https://beam.apache.org/documentation/io/developing-io-overview/#sinks] 
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to