[ 
https://issues.apache.org/jira/browse/HDDS-7350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17632420#comment-17632420
 ] 

Stephen O'Donnell commented on HDDS-7350:
-----------------------------------------

{quote}
In other words, if compression and/or encryption is enabled, each chunk will 
contain 
compressed and/or encrypted data. The processing stream is created per each 
served 
key, not for each chunk.
For example, we want to upload a 1GB file into Ozone, the transparent 
compression 
reduces the total file size to 800MB, which is further split into 4MB chunks 
and transferred 
to the server.
{quote}

I think this approach will limit the usefulness of the feature, as it will 
prevent someone from being able to seek within the file. Eg, if I have a 1GB 
file, and I want to seek to 500MB - how can I get there without reading all the 
of data until I reach 500MB? With a non-compressed file, we can goto a block 
and then a chunk offset and then seek within the chunk. You should look at how 
ZFS does transparent compression - it allows seeking within a file just as if 
it was not compressed. I've got a feeling we will need to have a compressions 
stream at the chunk level.

We also need to think through how compression would work with EC. Perhaps it 
does not need to be implemented in the first phase, but we need to think it out 
so there is not some limitation of the design which would make it very 
difficult to add later.

> Ozone Transparent Data Compression Support
> ------------------------------------------
>
>                 Key: HDDS-7350
>                 URL: https://issues.apache.org/jira/browse/HDDS-7350
>             Project: Apache Ozone
>          Issue Type: New Feature
>            Reporter: Kirill Sizov
>            Assignee: Kirill Sizov
>            Priority: Major
>         Attachments: compression_ozone - 2022.10.1.pdf, 
> compression_ozone-2022.10.2.pdf, compression_ozone-2022.11.1.pdf, 
> compression_ozone-2022.11.2.pdf
>
>
> Currently Ozone stores uncompressed data, which in case of text or a similar 
> format may benefit from being compressed. This may save significant amount of 
> space and hence the money.
> See the attached document for the design.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to