[
https://issues.apache.org/jira/browse/BEAM-14452?focusedWorklogId=769726&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-769726
]
ASF GitHub Bot logged work on BEAM-14452:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 12/May/22 15:23
Start Date: 12/May/22 15:23
Worklog Time Spent: 10m
Work Description: chamikaramj commented on PR #17599:
URL: https://github.com/apache/beam/pull/17599#issuecomment-1125129268
To clarify, next steps are.
(1) A brief design doc on why this approach works for reading/writing 4mc
files (or propose an alternative approach)
(2) Make sure that the PR passes current tests
(3) Add new unit tests related to reading and writing with 4mc compression.
See [1] and [2] for example tests.
(4) Add an integration test (optional)
[1]
https://github.com/apache/beam/blob/master/sdks/java/core/src/test/java/org/apache/beam/sdk/io/TextIOWriteTest.java
[2]
https://github.com/apache/beam/blob/master/sdks/java/core/src/test/java/org/apache/beam/sdk/io/TextIOWriteTest.java
Issue Time Tracking
-------------------
Worklog Id: (was: 769726)
Time Spent: 50m (was: 40m)
> Support Hadoop 4mc file format
> ------------------------------
>
> Key: BEAM-14452
> URL: https://issues.apache.org/jira/browse/BEAM-14452
> Project: Beam
> Issue Type: New Feature
> Components: sdk-java-core
> Reporter: Akhilesh Singh
> Priority: P2
> Time Spent: 50m
> Remaining Estimate: 0h
>
> The 4MC (4 More Compression) is a library for hadoop providing a new
> splittable compressed file format (4mc) which lets you leverage the power of
> LZ4 and ZSTD algorithms.
> It differs from other compressed file formats in a way that the output files
> can be read using HIVE partitioning - thus queries can be fast and reads from
> client applications can be very pointed to small compressed files stored on
> GCS.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)