Attachment upload speed varies widely based on how it is uploaded
-----------------------------------------------------------------
Key: COUCHDB-1192
URL: https://issues.apache.org/jira/browse/COUCHDB-1192
Project: CouchDB
Issue Type: Question
Components: HTTP Interface
Affects Versions: 1.0.2
Environment: OSX 10.6.7 MacBook Pro (7200 RPM disk)
CouchDBX 1.0.2
couchdb-python used as client code
Reporter: Eli Stevens
Priority: Minor
Running the following code on a macbook pro, using CouchDBX 1.0.2 (everything
local), we're seeing the following output when trying to attach a file with
10MB of random data:
Code: https://gist.github.com/bc0c36f36be0c85e2a36
Output:
Using put_attachment: 0.309157133102
post time: 2.5557808876
Using multipart: 2.61283898354
Encoding base64: 0.0497629642487
Updating: 5.0550069809
Server log: https://gist.github.com/a80a495fd35049ff871f (there's a
HEAD/DELETE/PUT/GET cycle that's just cleanup)
The calls in question are:
Using put_attachment: 0.309157133102
1> [info] [<0.27809.7>] 127.0.0.1 - - 'PUT'
/benchmark_entity/bigfile/smallfile?rev=81-c538b38a8463952f0136143cfa49e9fa 201
Using multipart: 2.61283898354 (post time: 2.5557808876)
1> [info] [<0.27809.7>] 127.0.0.1 - - 'POST' /benchmark_entity/bigfile 201
Updating: 5.0550069809
1> [info] [<0.27809.7>] 127.0.0.1 - - 'POST' /benchmark_entity/_bulk_docs 201
Profiling our code shows 1.5 sec of CPU usage in our code (which covers setup /
cleanup code that's not included in the times above), and 11.8 sec of total run
time, which roughly matches up with the PUT/POST times above. Basically, I
feel pretty confident that the bulk of the times above are not in our client
code, and are instead due to couchdb's handling time. We haven't conclusively
ruled out couchdb-python behaving very oddly, though it seems very unlikely.
Why is the form/multipart handler so much slower than using a bare PUT on the
attachment? Why is the base64 approach even slower? Is it due to bandwidth
issues, couchdb CPU usage...? If needed, we can update to 1.1 and test there.
Note that the curl code doesn't seem to result in the same MD5 when we get the
attachment back out, so I've snipped the output related to that.
Thanks for any help,
Eli
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira