I would like to see a basic “native” attachment provider with the limitations described in 2), as well as an “object store” provider targeting the S3 API. I think the consistency considerations are tractable if you’re comfortable with the possibility that attachments could possibly be orphaned in the object store in the case of a failed transaction.
I had not considered the “just write them on the file system” provider but that’s probably partly my cloud-native blinders. I think the main question there is redundancy; I would argue against trying to do any sort of replication across local disks. Users who happen to have an NFS-style mount point accessible to all the CouchDB nodes could use this option reliably, though. We should calculate a safe maximum attachment size for the native provider — as I understand things the FDB transaction size includes both keys and values, so our effective attachment size limit will be smaller. Adam > On Feb 28, 2019, at 6:21 AM, Robert Newson <rnew...@apache.org> wrote: > > Hi, > > Yes, I agree we should have a framework like that. Folks should be able to > choose S3 or COS (IBM), etc. > > I am personally on the hook for the implementation for CouchDB and for IBM > Cloudant and expect them to be different, so the framework, IMO, is a given. > > B. > >> On 28 Feb 2019, at 10:33, Jan Lehnardt <j...@apache.org> wrote: >> >> Thanks for getting this started, Bob! >> >> In fear of derailing this right off the bat, is there a potential 4) >> approach where on the CouchDB side there is a way to specify “attachment >> backends”, one of which could be 2), but others could be “node local file >> storage”*, others could be S3-API compatible, etc? >> >> *a bunch of heavy handwaving about how to ensure consistency and fault >> tolerance here. >> >> * * * >> >> My hypothetical 4) could also be a later addition, and we’ll do one of 1-3 >> first. >> >> >> * * * >> >> From 1-3, I think 2 is most pragmatic in terms of keeping desirable >> functionality, while limiting it so it can be useful in practice. >> >> I feel strongly about not dropping attachment support. While not ideal in >> all cases, it is an extremely useful and reasonably popular feature. >> >> Best >> Jan >> — >> >>> On 28. Feb 2019, at 11:22, Robert Newson <rnew...@apache.org> wrote: >>> >>> Hi All, >>> >>> We've not yet discussed attachments in terms of the foundationdb work so >>> here's where we do that. >>> >>> Today, CouchDB allows you to store large binary values, stored as a series >>> of much smaller chunks. These "attachments" cannot be indexed, they can >>> only be sent and received (you can fetch the whole thing or you can fetch >>> arbitrary subsets of them). >>> >>> On the FDB side, we have a few constraints. A transaction cannot be more >>> than 10MB and cannot take more than 5 seconds. >>> >>> Given that, there are a few paths to attachment support going forward; >>> >>> 1) Drop native attachment support. >>> >>> I suspect this is not going to be a popular approach but it's worth hearing >>> a range of views. Instead of direct attachment support, a user could store >>> the URL to the large binary content and could simply fetch that URL >>> directly. >>> >>> 2) Write attachments into FDB but with limits. >>> >>> The next simplest is to write the attachments into FDB as a series of >>> key/value entries, where the key is {database_name, doc_id, >>> attachment_name, 0..N} and the value is a short byte array (say, 16K to >>> match current). The 0..N is just a counter such that we can do an fdb range >>> get / iterator to retrieve the attachment. An embellishment would restore >>> the http Range header options, if we still wanted that (disclaimer: I >>> implemented the Range thing many years ago, I'm happy to drop support if no >>> one really cares for it in 2019). >>> >>> This would be subject to the 10mb and 5s limit, which is less that you >>> _can_ do today with attachments but not, in my opinion, any less that >>> people actually do (with some notable outliers like npm in the past). >>> >>> 3) Full functionality >>> >>> This would be the same as today. Attachments of arbitrary size (up to the >>> disk capacity of the fdb cluster). It would require some extra cleverness >>> to work over multiple txn transactions and in such a way that an aborted >>> upload doesn't leave partially uploaded data in fdb forever. I have not sat >>> down and designed this yet, hence I would very much like to hear from the >>> community as to which of these paths are sufficient. >>> >>> -- >>> Robert Samuel Newson >>> rnew...@apache.org >> >> -- >> Professional Support for Apache CouchDB: >> https://neighbourhood.ie/couchdb-support/ >