[
https://issues.apache.org/jira/browse/COUCHDB-769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13197926#comment-13197926
]
Dave Cottlehuber commented on COUCHDB-769:
------------------------------------------
This seems really useful, especially when coupled with CORS in future.
## Possible use cases
#1 I want a lean and mean couch, but attachments are on the server' FS
- external API is unchanged
- we redirect "large" attachments to file system per rnewson's approach
- couch still streams/returns the file
- should be per-db configurable as to where the bits are put
- may need some form of hashing/buckets to spread out across a directory
structure
- sanitise path & filename to avoid security holes (uridecode, trim to basename)
- if attachment is not present we'd need to issue something like
417 Expectation Failed (assumes that we are acting as a proxy here)
A 404 doesn't feel right and also using 417 would make tracking these down
in logs very easy. 500 or 501 would be OK too.
#2 Host these attachments somewhere else, just store metadata and redirect
- external request API is unchanged *however* the response would be a redirect
- POST would also need to change
- instead issues redirect - 302 Found to ensure future requests still go via
couch
- no sanitisation of path reqd
- might need to be per-db or per-server configurable to ensure public couches
don't become easy targets for spam referrers
- couch doesn't stream/return the file itself
POST
{
"_id" : "redirect302",
"meta": "data",
"_attachments" : {
"fox.png" : {
"content-type" : "image/png",
"uri" : "http://your.bucket.s3.amazonaws.com/fox.png"
}
}
}
## Per-server configuration.
For #1 and #2 then we would have:
[attachments]
redirection_handler = true ; <doc>._attachments.<name> redirects to ...uri
filestore_handler = true ; enable storing large attachments on filesystem
filestore_threshold = 1048576 ; size in Bytes above which
filestore_dir = /var/lib/couchdb/attachments/ ; each couch has a named subdir
## Considerations.
I think we should preserve the current _attachments structure and potential
user-provided metadata, even if the actual attachments are stored elsewhere.
MD5 and similar checks should be still be feasible using this.
In #1 it should not be possible to exploit the server to expose data by fiddling
with pathnames and filenames.
I would imagine in a BigCouch scenario that #1 presents some further
challenges. Using #2 and "uri": "file://nfsmount/somefile" won't work
as it leaks server implementation and may be exploitable.
Also #2 might also be useful for people running their infrastructure within a
cloud
provider like AWS S3, and they might want to serve their attachments using
couch as a proxy, rather than expose the external URI.
> Store large attachments external to the .couch file
> ---------------------------------------------------
>
> Key: COUCHDB-769
> URL: https://issues.apache.org/jira/browse/COUCHDB-769
> Project: CouchDB
> Issue Type: New Feature
> Components: Database Core
> Reporter: Robert Newson
> Attachments: external_attachments_alpha.patch
>
>
> For attachment-heavy applications storing the attachments in separate files
> significantly eases compaction problems.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira