Attachment upload speed varies widely based on how it is uploaded
-----------------------------------------------------------------

                 Key: COUCHDB-1192
                 URL: https://issues.apache.org/jira/browse/COUCHDB-1192
             Project: CouchDB
          Issue Type: Question
          Components: HTTP Interface
    Affects Versions: 1.0.2
         Environment: OSX 10.6.7 MacBook Pro (7200 RPM disk)
CouchDBX 1.0.2
couchdb-python used as client code
            Reporter: Eli Stevens
            Priority: Minor


Running the following code on a macbook pro, using CouchDBX 1.0.2 (everything 
local), we're seeing the following output when trying to attach a file with 
10MB of random data:

Code: https://gist.github.com/bc0c36f36be0c85e2a36
Output:

Using put_attachment: 0.309157133102
post time: 2.5557808876
Using multipart: 2.61283898354
Encoding base64: 0.0497629642487
Updating: 5.0550069809

Server log: https://gist.github.com/a80a495fd35049ff871f (there's a 
HEAD/DELETE/PUT/GET cycle that's just cleanup)

The calls in question are:

Using put_attachment: 0.309157133102
1> [info] [<0.27809.7>] 127.0.0.1 - - 'PUT' 
/benchmark_entity/bigfile/smallfile?rev=81-c538b38a8463952f0136143cfa49e9fa 201

Using multipart: 2.61283898354 (post time: 2.5557808876) 
1> [info] [<0.27809.7>] 127.0.0.1 - - 'POST' /benchmark_entity/bigfile 201

Updating: 5.0550069809
1> [info] [<0.27809.7>] 127.0.0.1 - - 'POST' /benchmark_entity/_bulk_docs 201

Profiling our code shows 1.5 sec of CPU usage in our code (which covers setup / 
cleanup code that's not included in the times above), and 11.8 sec of total run 
time, which roughly matches up with the PUT/POST times above.  Basically, I 
feel pretty confident that the bulk of the times above are not in our client 
code, and are instead due to couchdb's handling time.  We haven't conclusively 
ruled out couchdb-python behaving very oddly, though it seems very unlikely.

Why is the form/multipart handler so much slower than using a bare PUT on the 
attachment?  Why is the base64 approach even slower?  Is it due to bandwidth 
issues, couchdb CPU usage...?  If needed, we can update to 1.1 and test there.

Note that the curl code doesn't seem to result in the same MD5 when we get the 
attachment back out, so I've snipped the output related to that.

Thanks for any help,
Eli

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to