damccorm opened a new issue, #19828:
URL: https://github.com/apache/beam/issues/19828

   From customer:
   
    
   > We've updated our DataFlow templates to read and write with gzip 
compression. I noticed when .gz file is written the object's metadata defaults 
to "application/octet-stream" for Content-Type because it doesn't know what it 
is. I would like to have each file be plain/text for content-type and gzip for 
content-encoding. We may also add other metadata key/value pairs. I can't find 
a way to programmatically set these and other metadata values per object within 
DataFlow. I'm using TextIO right now and just doing .withCompression. I didn't 
see any other functions to achieve this or any DataFlow doc on it. Am I missing 
something?
   >  
   
   The MIME type of the output file can be set by supplying your own 
WritableByteChannelFactory to TextIO which sets the MIME type to your desired 
value[0].
   
   The default WritableByteChannelFactory for TextIO is "text/plain", but when 
"withCompression" is used, this becomes "application/octet-stream"[1][2].
   
   Unfortunately, FileSystems.create does not support setting a 
content-encoding on the output channel. I will ensure that this specific point 
is captured in the feature request, though at this point it becomes an upstream 
change to Beam rather than a change to Dataflow.
   
   [0] 
https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextIO.java#L1175
   
   [1] 
https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileBasedSink.java#L874
   
   [2] 
https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/util/MimeTypes.java
   
   [3] 
https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileSystems.java#L224
   
   Imported from Jira 
[BEAM-8180](https://issues.apache.org/jira/browse/BEAM-8180). Original Jira may 
contain additional context.
   Reported by: cjac.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to