[
https://issues.apache.org/jira/browse/BEAM-8180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kenneth Knowles updated BEAM-8180:
----------------------------------
Status: Open (was: Triage Needed)
> Files managed by beam should have associated AVPs such as content-type and
> content-encoding instead of merely mimeType
> ----------------------------------------------------------------------------------------------------------------------
>
> Key: BEAM-8180
> URL: https://issues.apache.org/jira/browse/BEAM-8180
> Project: Beam
> Issue Type: Improvement
> Components: io-java-text
> Environment: Google Compute Plaform DataFlow
> Reporter: C.J. Collier
> Priority: Minor
>
> From customer:
>
> {quote}We've updated our DataFlow templates to read and write with gzip
> compression. I noticed when .gz file is written the object's metadata
> defaults to "application/octet-stream" for Content-Type because it doesn't
> know what it is. I would like to have each file be plain/text for
> content-type and gzip for content-encoding. We may also add other metadata
> key/value pairs. I can't find a way to programmatically set these and other
> metadata values per object within DataFlow. I'm using TextIO right now and
> just doing .withCompression. I didn't see any other functions to achieve this
> or any DataFlow doc on it. Am I missing something?
> {quote}
>
> The MIME type of the output file can be set by supplying your own
> WritableByteChannelFactory to TextIO which sets the MIME type to your desired
> value[0].
> The default WritableByteChannelFactory for TextIO is "text/plain", but when
> "withCompression" is used, this becomes "application/octet-stream"[1][2].
> Unfortunately, FileSystems.create does not support setting a content-encoding
> on the output channel. I will ensure that this specific point is captured in
> the feature request, though at this point it becomes an upstream change to
> Beam rather than a change to Dataflow.
> [0]
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextIO.java#L1175
> [1]
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileBasedSink.java#L874
> [2]
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/util/MimeTypes.java
> [3]
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileSystems.java#L224
--
This message was sent by Atlassian Jira
(v8.3.4#803005)