Thanks for getting this started, Bob!

In fear of derailing this right off the bat, is there a potential 4) approach 
where on the CouchDB side there is a way to specify “attachment backends”, one 
of which could be 2), but others could be “node local file storage”*, others 
could be S3-API compatible, etc?

*a bunch of heavy handwaving about how to ensure consistency and fault 
tolerance here.

* * *

My hypothetical 4) could also be a later addition, and we’ll do one of 1-3 
first.


* * *

From 1-3, I think 2 is most pragmatic in terms of keeping desirable 
functionality, while limiting it so it can be useful in practice.

I feel strongly about not dropping attachment support. While not ideal in all 
cases, it is an extremely useful and reasonably popular feature.

Best
Jan
—

> On 28. Feb 2019, at 11:22, Robert Newson <rnew...@apache.org> wrote:
> 
> Hi All,
> 
> We've not yet discussed attachments in terms of the foundationdb work so 
> here's where we do that.
> 
> Today, CouchDB allows you to store large binary values, stored as a series of 
> much smaller chunks. These "attachments" cannot be indexed, they can only be 
> sent and received (you can fetch the whole thing or you can fetch arbitrary 
> subsets of them).
> 
> On the FDB side, we have a few constraints. A transaction cannot be more than 
> 10MB and cannot take more than 5 seconds.
> 
> Given that, there are a few paths to attachment support going forward;
> 
> 1) Drop native attachment support. 
> 
> I suspect this is not going to be a popular approach but it's worth hearing a 
> range of views. Instead of direct attachment support, a user could store the 
> URL to the large binary content and could simply fetch that URL directly.
> 
> 2) Write attachments into FDB but with limits.
> 
> The next simplest is to write the attachments into FDB as a series of 
> key/value entries, where the key is {database_name, doc_id, attachment_name, 
> 0..N} and the value is a short byte array (say, 16K to match current). The 
> 0..N is just a counter such that we can do an fdb range get / iterator to 
> retrieve the attachment. An embellishment would restore the http Range header 
> options, if we still wanted that (disclaimer: I implemented the Range thing 
> many years ago, I'm happy to drop support if no one really cares for it in 
> 2019).
> 
> This would be subject to the 10mb and 5s limit, which is less that you _can_ 
> do today with attachments but not, in my opinion, any less that people 
> actually do (with some notable outliers like npm in the past).
> 
> 3) Full functionality
> 
> This would be the same as today. Attachments of arbitrary size (up to the 
> disk capacity of the fdb cluster). It would require some extra cleverness to 
> work over multiple txn transactions and in such a way that an aborted upload 
> doesn't leave partially uploaded data in fdb forever. I have not sat down and 
> designed this yet, hence I would very much like to hear from the community as 
> to which of these paths are sufficient.
> 
> -- 
>  Robert Samuel Newson
>  rnew...@apache.org

-- 
Professional Support for Apache CouchDB:
https://neighbourhood.ie/couchdb-support/

Reply via email to