Thanks for getting this started, Bob! In fear of derailing this right off the bat, is there a potential 4) approach where on the CouchDB side there is a way to specify “attachment backends”, one of which could be 2), but others could be “node local file storage”*, others could be S3-API compatible, etc?
*a bunch of heavy handwaving about how to ensure consistency and fault tolerance here. * * * My hypothetical 4) could also be a later addition, and we’ll do one of 1-3 first. * * * From 1-3, I think 2 is most pragmatic in terms of keeping desirable functionality, while limiting it so it can be useful in practice. I feel strongly about not dropping attachment support. While not ideal in all cases, it is an extremely useful and reasonably popular feature. Best Jan — > On 28. Feb 2019, at 11:22, Robert Newson <rnew...@apache.org> wrote: > > Hi All, > > We've not yet discussed attachments in terms of the foundationdb work so > here's where we do that. > > Today, CouchDB allows you to store large binary values, stored as a series of > much smaller chunks. These "attachments" cannot be indexed, they can only be > sent and received (you can fetch the whole thing or you can fetch arbitrary > subsets of them). > > On the FDB side, we have a few constraints. A transaction cannot be more than > 10MB and cannot take more than 5 seconds. > > Given that, there are a few paths to attachment support going forward; > > 1) Drop native attachment support. > > I suspect this is not going to be a popular approach but it's worth hearing a > range of views. Instead of direct attachment support, a user could store the > URL to the large binary content and could simply fetch that URL directly. > > 2) Write attachments into FDB but with limits. > > The next simplest is to write the attachments into FDB as a series of > key/value entries, where the key is {database_name, doc_id, attachment_name, > 0..N} and the value is a short byte array (say, 16K to match current). The > 0..N is just a counter such that we can do an fdb range get / iterator to > retrieve the attachment. An embellishment would restore the http Range header > options, if we still wanted that (disclaimer: I implemented the Range thing > many years ago, I'm happy to drop support if no one really cares for it in > 2019). > > This would be subject to the 10mb and 5s limit, which is less that you _can_ > do today with attachments but not, in my opinion, any less that people > actually do (with some notable outliers like npm in the past). > > 3) Full functionality > > This would be the same as today. Attachments of arbitrary size (up to the > disk capacity of the fdb cluster). It would require some extra cleverness to > work over multiple txn transactions and in such a way that an aborted upload > doesn't leave partially uploaded data in fdb forever. I have not sat down and > designed this yet, hence I would very much like to hear from the community as > to which of these paths are sufficient. > > -- > Robert Samuel Newson > rnew...@apache.org -- Professional Support for Apache CouchDB: https://neighbourhood.ie/couchdb-support/