Thanks for the reply. Option 1 seems pretty reasonable for me. I would 
probably go as far as to frame the messages with magic + message size, that 
way I can verify that when there's another magic (or end of file) at 
current position + message size It's probably correct.

On Friday, April 14, 2017 at 1:08:55 PM UTC-7, Kenton Varda wrote:
>
> Hi Stepan,
>
> No, there's no easy way to detect the corruption your describe. In fact, 
> for most serialization formats, there's no solution to this problem. Once 
> you've lost track of message boundaries, it's impossible to tell the 
> difference between the start of a new message vs. data in the previous 
> message, since any message can contain arbitrary byte blobs (e.g. via the 
> `Data` type).
>
> If what you describe is a requirement for your use case, you could 
> accomplish it with an additional framing layer.
>
> Option 1: Choose an 128-bit unguessable random number before you start 
> writing. Write that number before each message. Now you can scan the bytes 
> of the file looking for this 128-bit sequence and, if you see it, you can 
> be fairly certain (p ~= 2^-128) that a new message starts after it. You 
> have to use a new random number for every file in case you ever embed a 
> whole file into another file.
>
> Option 2: Choose a magic number to write before each message, *and* scan 
> the contents of each message for this number, replacing it with an "escape 
> sequence" if seen. Do the opposite transformation while reading. This 
> allows you to detect boundaries "perfectly" (zero probability of false 
> positive) but you lose the benefits of zero-copy due to the need to process 
> escape sequences.
>
> -Kenton
>
> On Fri, Apr 14, 2017 at 12:35 PM, <[email protected] <javascript:>> 
> wrote:
>
>> I have a message that serializes into 24 bytes. I write two messages to a 
>> file resulting in a file thats 48 bytes long. Now I truncate the file to 40 
>> bytes and write one message, so the file now looks like this: 1 full 
>> message, one broken, 1 full message. Is there any way to iterate over the 
>> file and when encountering the broken message detect that it is broken and 
>> skip directly to the second full message? I've been using python to read 
>> such file with following code
>>
>> def main():
>>     with open('dates.txt', 'r') as fp:
>>         for date in date_capnp.Date.read_multiple(fp):
>>                 print(date)
>>
>> But it fails with following message:
>>
>> Message contains non-struct pointer where struct pointer was expected
>>
>> Also, if it's possible to detect such message, is it possible to get it's 
>> position and length? Thank you.
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Cap'n Proto" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> Visit this group at https://groups.google.com/group/capnproto.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
Visit this group at https://groups.google.com/group/capnproto.

Reply via email to