tra added a comment.

In D122069#3397560 <https://reviews.llvm.org/D122069#3397560>, @jhuber6 wrote:

> I was definitely thinking about a lot of alternatives to making yet another 
> on-disk binary format. I had a few discussions with @JonChesterfield on the 
> subject. There were two things that put me off of using JSON primarily, 
> correct me if I have any misconceptions.
>
> 1. This is fundamentally a very thin metadata wrapper around a binary image. 
> AFAIK if you want to stream binary data to a JSON you need to use a base64 or 
> similar encoding, which makes the files bigger and adds some extra work.

I didn't realize that content of the file is part of your packagin scheme. I've 
interpreted `embed certain metadata along with a binary image ` as literally 
keeping the binaries as binaries and just adding a small blob with additional 
metadata. It was that data I meant to encode as JSON. Encoding the offload 
binaries themselves as JSON would indeed be wasteful.

We could keep the header as a binary (never changes on-disk format) and use 
JSON representation for the array of the entries (0-terminated string or string 
+ length stored in the header) which also has fixed (as in version-agnostic) 
on-disk format, though of variable length.
Versioning still has to be dealt with, but now it would be independent of the 
on-disk format. The variable length of JSON is both a plus and a minus. On the 
positive side is that content is open. Some tool may add whatever is relevant 
for its own use, comments, provenance info, checksum, etc.
The downside is that it has variable length, so it would have to be written 
after the image binary. We would also need to deal with potential errors 
parsing JSON.

I don't have a strong preference either way. I think JSON may have few minor 
benefits, but the proposed binary format has the advantage of simplicity. We 
can always switch to json-encoded entries later by bumping the header version.



================
Comment at: clang/include/clang/Basic/Offloading.h:87-104
+  struct Header {
+    uint8_t Magic[4] = {0x10, 0xFF, 0x10, 0xAD};
+    uint32_t Version;
+    uint64_t Size;
+    uint64_t EntryOffset;
+  };
+
----------------
jhuber6 wrote:
> jhuber6 wrote:
> > tra wrote:
> > > tra wrote:
> > > > Given that it's an on-disk format I think these should be explicitly 
> > > > packed and carry a comment that it's an on-disk format which needs 
> > > > extra caution during future changes.
> > > > 
> > > > 
> > > On-disk format could use more comments.
> > What do you mean by explicitly packed? And I added the "Version" field in 
> > the header so we can warn on old versions.
> Noted, will add more if we settle on this format.
`__attribute__((packed))`. Otherwise you depend on assumed natural alignment 
and that is target-dependent, IIRC.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D122069/new/

https://reviews.llvm.org/D122069

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to