> From: Linas Vepstas <[email protected]> > Cc: [email protected] > > 2009/5/13 Sebastian Tennant <[email protected]>: > > > Restricting regexps to actual text is fine... until > > you need to grep binary data, or, as in this case, > > a combination of text and binary data. > > > in cgi.scm that extracted the uploaded (possibly > > binary) file, because the pattern identifying the > > beginning of the file in the raw data string is > > simple ("\n\r\n\r") - > > No, this sounds somehow broken. If I remember correctly, > binary mime-parts should have a ConentLength header > so you can skip over them. If ContentLength is absent, > then the part should bee ascii-encoded (e.g. base64) > yeah, grapping large blocks of ascii sucks, which is > why the ContetnLength should be used. > > -- linas
If the spec says a length indication followed by a fixed length of arbitrary binary data, then it is not just sucky, but incorrect to apply either grep or regexp to the binary. It will seem to work until it hits a binary data that "by accident" contains the string you are looking for. The only correct algorithm is to make a preliminary pass to somehow remove the binary data and pseudo-concatenate the remaining strings. -- Keith
