On 2024/2/19 12:44, Gao Xiang wrote:
Hi Mike,

On 2024/2/19 11:37, Mike Baynton wrote:
Hello erofs developers,
I am integrating erofs with overlayfs in a manner similar to what
composefs is doing. So, I am interested in making erofs images
containing only file metadata and extended attributes, but no file
data, as in $ mkfs.erofs --tar=i (thanks for that!)

Thanks for your interest in EROFS too.


However, I would like to construct the erofs image from a set of files
selected dynamically by another program. This leads me to prefer
sending an unseekable stream to mkfs.erofs so that file selection and
image generation can run concurrently, instead of first making a
complete tarball and then making the erofs image. In this case, it
becomes necessary to transfer each file's worth of data through the
stream after each header only so that the tarball reader in tar.c does
not become desynchronized with the expected offset of the next tar
header.

I wonder if it's possible to use a modified prototype-like [1] format
which mkfs.xfs [2] currently supports with "-p".  This prototype can
be passed with a pipe instead.

[1] http://uw714doc.sco.com/en/man/html.4/prototype.4.html
[2] https://man7.org/linux/man-pages/man8/mkfs.xfs.8.html

.. mkfs.xfs protofile uses the following syntax originally instead:
https://man.cat-v.org/unix-6th/8/mkfs



A very straightforward solution that seems to be working just fine for
me is to simply introduce a new optarg for --tar that indicates the
input data will be simply a series of tar headers / metadata without
actual file data. This implies index mode and additionally prevents
the skipping of inode.size worth of bytes after each header:

diff --git a/include/erofs/tar.h b/include/erofs/tar.h
index a76f740..3d40a0f 100644
--- a/include/erofs/tar.h
+++ b/include/erofs/tar.h
@@ -46,7 +46,7 @@ struct erofs_tarfile {

   int fd;
   u64 offset;
- bool index_mode, aufs;
+ bool index_mode, headeronly_mode, aufs;
  };

  void erofs_iostream_close(struct erofs_iostream *ios);
diff --git a/lib/tar.c b/lib/tar.c
index 8204939..e916395 100644
--- a/lib/tar.c
+++ b/lib/tar.c
@@ -584,7 +584,7 @@ static int tarerofs_write_file_index(struct
erofs_inode *inode,
   ret = tarerofs_write_chunkes(inode, data_offset);
   if (ret)
   return ret;
- if (erofs_iostream_lskip(&tar->ios, inode->i_size))
+ if (!tar->headeronly_mode && erofs_iostream_lskip(&tar->ios, inode->i_size))
   return -EIO;
   return 0;
  }
diff --git a/mkfs/main.c b/mkfs/main.c
index 6d2b700..a72d30e 100644
--- a/mkfs/main.c
+++ b/mkfs/main.c
@@ -122,7 +122,7 @@ static void usage(void)
         " --max-extent-bytes=#  set maximum decompressed extent size #
in bytes\n"
         " --preserve-mtime      keep per-file modification time strictly\n"
         " --aufs                replace aufs special files with
overlayfs metadata\n"
-       " --tar=[fi]            generate an image from tarball(s)\n"
+       " --tar=[fih]           generate an image from tarball(s) or
tarball header data\n"
         " --ovlfs-strip=[01]    strip overlayfs metadata in the target
image (e.g. whiteouts)\n"
         " --quiet               quiet execution (do not write anything
to standard output.)\n"
  #ifndef NDEBUG
@@ -514,11 +514,13 @@ static int mkfs_parse_options_cfg(int argc, char *argv[])
   cfg.c_extra_ea_name_prefixes = true;
   break;
   case 20:
- if (optarg && (!strcmp(optarg, "i") ||
- !strcmp(optarg, "0") || !memcmp(optarg, "0,", 2))) {
+ if (optarg && (!strcmp(optarg, "i") || (!strcmp(optarg, "h") ||
+ !strcmp(optarg, "0") || !memcmp(optarg, "0,", 2)))) {
   erofstar.index_mode = true;
   if (!memcmp(optarg, "0,", 2))
   erofstar.mapfile = strdup(optarg + 2);
+ if (!strcmp(optarg, "h"))
+ erofstar.headeronly_mode = true;
   }
   tar_mode = true;
   break;

Using this requires generation of tarball-ish streams that can be
slightly difficult to cajole tar libraries into creating, but it does
work if you do it. I can imagine much more complex alternative ways to
do this too, such as supporting sparse tar files or supporting some
whole new input format.

I think you could just fill zero to use the current index mode now.
But yes, it could be inefficient if some files are huge.


Would some version of this feature be interesting and useful? If so,
is the simple way good enough? It wouldn't preclude future addition of
things like a sparse tar reader.

Yes, I think it's useful to support a simple prototype-like format, but
it might take time on my own since there are some other ongoing stuffs
to be landed (like multi-threading mkfs support.)

Thanks,
Gao Xiang


Regards,
Mike

Reply via email to