[ 
https://issues.apache.org/jira/browse/COUCHDB-558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778336#action_12778336
 ] 

Filipe Manana commented on COUCHDB-558:
---------------------------------------

Hum,

The mochiweb_multipart:parse_headers function will call 
mochiweb_utils:parse_header which expects the header values to be of the form 
"X; Y=Z" as far as I understood from the source and the test method:

test_parse_header() ->
    {"multipart/form-data", [{"boundary", "AaB03x"}]} =
        parse_header("multipart/form-data; boundary=AaB03x"),
    ok.

I've just discovered this now: 
http://www.erlang.org/doc/man/erlang.html#decode_packet-3
Maybe if we pass the full trailer binary, it will be able to decode it as an 
http header. To be tested.

Regarding the integrity checks of chunked requests, I just had an idea (but 
complicated, with a poor performance and incomplete):

1) In the ChunksFun, as soon as the amount of data (sum of the length of the 
chunks received so far) reaches a certain value X, we start putting the chunks 
in a temporary file. The name of the file is put in the current state #httpd{} 
record.

2) After receiving the whole request, compute the MD5 digest and compare it to 
the given digest. If they do not match, remove the tmp file and the file name 
entry in #httpd{}.

3) The update_req/2 function will no longer replace the chunked http request 
with a non-chunked http request.

4) Modify the recv_chunked function (couch_httpd.erl) to check if the given 
#httpd record as a tmp file name in it. If so, it will read each chunk from it 
and pass it to given ChunkFun callback.

Major obvious problems:

1) too complicated solution
2) poor disk performance if we have large requests and many in parallel
3) after crashes, we risk having useless tmp files lying around on disk
4) to compute the md5 digest, we still need to read the whole content 
("unchunked") into memory. Do you now of any "incremental" MD5 digest 
implementation? I've never heard about it.

Have you come up with any idea of this sort?

Possibly it might be interesting to check how httpd servers like Apache deal 
with this situation (if they do, or they just buffer all the chunks in memory).

cheers

> Validate Content-MD5 request headers on uploads
> -----------------------------------------------
>
>                 Key: COUCHDB-558
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-558
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: Database Core, HTTP Interface
>            Reporter: Adam Kocoloski
>             Fix For: 0.11
>
>         Attachments: jira-couchdb-558-for-trunk-2nd-try.patch, 
> jira-couchdb-558-for-trunk-3rd-try.patch, jira-couchdb-558-for-trunk.patch, 
> run.tpl.patch
>
>
> We could detect in-flight data corruption if a client sends a Content-MD5 
> header along with the data and Couch validates the MD5 on arrival.
> RFC1864 - The Content-MD5 Header Field
> http://www.faqs.org/rfcs/rfc1864.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to