Hi Sandeep,

On 2024/4/4 07:57, Sandeep Dhavale wrote:
Add optimization to treat data blocks filled with 0s as a hole.
Even though diskspace savings are comparable to chunk based or dedupe,
having no block assigned saves us redundant disk IOs during read.

This patch detects if the block is filled with zeros and marks
chunk as erofs_holechunk so there is no physical block assigned.

Signed-off-by: Sandeep Dhavale <dhav...@google.com>
---
  lib/blobchunk.c | 25 ++++++++++++++++++++++++-
  1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/lib/blobchunk.c b/lib/blobchunk.c
index 641e3d4..8535058 100644
--- a/lib/blobchunk.c
+++ b/lib/blobchunk.c
@@ -232,6 +232,21 @@ static void erofs_update_minextblks(struct erofs_sb_info 
*sbi,
                *minextblks = lb;
  }
+static bool erofs_is_buf_zeros(void *buf, unsigned long len)
+{
+       int i, words;
+       const unsigned long *words_buf = buf;
+       words = len / sizeof(unsigned long);
+
+       DBG_BUGON(len % sizeof(unsigned long));
+
+       for (i = 0; i < words; i++) {
+               if (words_buf[i])
+                       return false;
+       }
+       return true;
+}
+
  int erofs_blob_write_chunked_file(struct erofs_inode *inode, int fd,
                                  erofs_off_t startoff)
  {
@@ -323,7 +338,15 @@ int erofs_blob_write_chunked_file(struct erofs_inode 
*inode, int fd,
                        ret = -EIO;
                        goto err;
                }
-
+               if (len == chunksize && erofs_is_buf_zeros(chunkdata, len)) {
+                       /* if data is all zeros, treat this block as hole */
+                       *(void **)idx++ = &erofs_holechunk;
+                       erofs_update_minextblks(sbi, interval_start, pos,
+                                               &minextblks);
+                       interval_start = pos + len;
+                       lastch = NULL;
+                       continue;
+               }
                chunk = erofs_blob_getchunk(sbi, chunkdata, len);

Yes, it's a valid opt.  Yet, I guess we could calculate the unique hash value
of all zeroes (with chunksize) in advance (e.g. when initialization).

And compare the buf hash and the all-zeroed hash in erofs_blob_getchunk(), if
they are the same, let's just compare all data then (or not compare since
it's little chance to have such collision. In that case, erofs_blob_getchunk
returns erofs_holechunk.

Does it sound a good idea to you?

Thanks,
Gao Xiang

                if (IS_ERR(chunk)) {
                        ret = PTR_ERR(chunk);

Reply via email to