[ 
https://issues.apache.org/jira/browse/BEAM-3095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16538589#comment-16538589
 ] 

Kyle Winkelman commented on BEAM-3095:
--------------------------------------

[https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextIO.java#L147]
{code:java}
PCollection<String> lines = ...;
lines.apply(TextIO.write().to("/path/to/file.txt"))
     .withSuffix(".txt")
     .withCompression(Compression.GZIP));{code}
This example is the one that does not work. There is no method 
.withCompression() on the TextIO.Write class.

I believe the below would work (but it uses deprecated APIs):
{code:java}
PCollection<String> lines = ...;
lines.apply(TextIO.write().to("/path/to/file.txt")
     .withSuffix(".txt")
     
.withWritableByteChannelFactory(FileBasedSink.CompressionType.fromCanonical(Compression.GZIP)));
{code}
This Jira should probably be to add a convenience method to the TextIO.Write 
class similiar to the one in the TextIO.TypedWrite class.

[https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextIO.java#L886]
{code:java}
public TypedWrite<UserT, DestinationT> withCompression(Compression compression) 
{
    checkArgument(compression != null, "compression can not be null");
    return 
withWritableByteChannelFactory(FileBasedSink.CompressionType.fromCanonical(compression));
}
{code}

> .withCompression() hinted at in docs, but not usable
> ----------------------------------------------------
>
>                 Key: BEAM-3095
>                 URL: https://issues.apache.org/jira/browse/BEAM-3095
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-java-core
>            Reporter: Rafael Fernandez
>            Assignee: Chamikara Jayalath
>            Priority: Major
>
> There is a FileBasedSink.CompressionType enum, and a comment in TextIO.java 
> that suggests .withCompression(...) is available. Alas, there does not seem 
> to be a documented way to write compressed output. It's unclear whether the 
> documentation is wrong, or the functionality is indeed missing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to