FWIW capnp messages already encode their own size at the start of the message (or, rather, they encode a segment table, which you can sum up to get the total size).
This might be useful: https://github.com/sandstorm-io/capnproto/blob/master/c++/src/capnp/serialize.h#L111 -Kenton On Fri, Apr 14, 2017 at 1:17 PM, <[email protected]> wrote: > Thanks for the reply. Option 1 seems pretty reasonable for me. I would > probably go as far as to frame the messages with magic + message size, that > way I can verify that when there's another magic (or end of file) at > current position + message size It's probably correct. > > On Friday, April 14, 2017 at 1:08:55 PM UTC-7, Kenton Varda wrote: >> >> Hi Stepan, >> >> No, there's no easy way to detect the corruption your describe. In fact, >> for most serialization formats, there's no solution to this problem. Once >> you've lost track of message boundaries, it's impossible to tell the >> difference between the start of a new message vs. data in the previous >> message, since any message can contain arbitrary byte blobs (e.g. via the >> `Data` type). >> >> If what you describe is a requirement for your use case, you could >> accomplish it with an additional framing layer. >> >> Option 1: Choose an 128-bit unguessable random number before you start >> writing. Write that number before each message. Now you can scan the bytes >> of the file looking for this 128-bit sequence and, if you see it, you can >> be fairly certain (p ~= 2^-128) that a new message starts after it. You >> have to use a new random number for every file in case you ever embed a >> whole file into another file. >> >> Option 2: Choose a magic number to write before each message, *and* scan >> the contents of each message for this number, replacing it with an "escape >> sequence" if seen. Do the opposite transformation while reading. This >> allows you to detect boundaries "perfectly" (zero probability of false >> positive) but you lose the benefits of zero-copy due to the need to process >> escape sequences. >> >> -Kenton >> >> On Fri, Apr 14, 2017 at 12:35 PM, <[email protected]> wrote: >> >>> I have a message that serializes into 24 bytes. I write two messages to >>> a file resulting in a file thats 48 bytes long. Now I truncate the file to >>> 40 bytes and write one message, so the file now looks like this: 1 full >>> message, one broken, 1 full message. Is there any way to iterate over the >>> file and when encountering the broken message detect that it is broken and >>> skip directly to the second full message? I've been using python to read >>> such file with following code >>> >>> def main(): >>> with open('dates.txt', 'r') as fp: >>> for date in date_capnp.Date.read_multiple(fp): >>> print(date) >>> >>> But it fails with following message: >>> >>> Message contains non-struct pointer where struct pointer was expected >>> >>> Also, if it's possible to detect such message, is it possible to get >>> it's position and length? Thank you. >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "Cap'n Proto" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> Visit this group at https://groups.google.com/group/capnproto. >>> >> >> -- > You received this message because you are subscribed to the Google Groups > "Cap'n Proto" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > Visit this group at https://groups.google.com/group/capnproto. > -- You received this message because you are subscribed to the Google Groups "Cap'n Proto" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. Visit this group at https://groups.google.com/group/capnproto.
