[ 
https://issues.apache.org/jira/browse/BEAM-14452?focusedWorklogId=769726&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-769726
 ]

ASF GitHub Bot logged work on BEAM-14452:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 12/May/22 15:23
            Start Date: 12/May/22 15:23
    Worklog Time Spent: 10m 
      Work Description: chamikaramj commented on PR #17599:
URL: https://github.com/apache/beam/pull/17599#issuecomment-1125129268

   To clarify, next steps are.
   
   (1) A brief design doc on why this approach works for reading/writing 4mc 
files (or propose an alternative approach)
   (2) Make sure that the PR passes current tests
   (3) Add new unit tests related to reading and writing with 4mc compression. 
See [1] and [2] for example tests.
   (4) Add an integration test (optional)
   
   [1] 
https://github.com/apache/beam/blob/master/sdks/java/core/src/test/java/org/apache/beam/sdk/io/TextIOWriteTest.java
   [2] 
https://github.com/apache/beam/blob/master/sdks/java/core/src/test/java/org/apache/beam/sdk/io/TextIOWriteTest.java




Issue Time Tracking
-------------------

    Worklog Id:     (was: 769726)
    Time Spent: 50m  (was: 40m)

> Support Hadoop 4mc file format
> ------------------------------
>
>                 Key: BEAM-14452
>                 URL: https://issues.apache.org/jira/browse/BEAM-14452
>             Project: Beam
>          Issue Type: New Feature
>          Components: sdk-java-core
>            Reporter: Akhilesh Singh
>            Priority: P2
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> The 4MC (4 More Compression) is a library for hadoop providing a new 
> splittable compressed file format (4mc) which lets you leverage the power of 
> LZ4 and ZSTD algorithms.
> It differs from other compressed file formats in a way that the output files 
> can be read using HIVE partitioning - thus queries can be fast and reads from 
> client applications can be very pointed to small compressed files stored on 
> GCS. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to