Hi,

Yes, I agree we should have a framework like that. Folks should be able to 
choose S3 or COS (IBM), etc. 

I am personally on the hook for the implementation for CouchDB and for IBM 
Cloudant and expect them to be different, so the framework, IMO, is a given. 

B. 

> On 28 Feb 2019, at 10:33, Jan Lehnardt <j...@apache.org> wrote:
> 
> Thanks for getting this started, Bob!
> 
> In fear of derailing this right off the bat, is there a potential 4) approach 
> where on the CouchDB side there is a way to specify “attachment backends”, 
> one of which could be 2), but others could be “node local file storage”*, 
> others could be S3-API compatible, etc?
> 
> *a bunch of heavy handwaving about how to ensure consistency and fault 
> tolerance here.
> 
> * * *
> 
> My hypothetical 4) could also be a later addition, and we’ll do one of 1-3 
> first.
> 
> 
> * * *
> 
> From 1-3, I think 2 is most pragmatic in terms of keeping desirable 
> functionality, while limiting it so it can be useful in practice.
> 
> I feel strongly about not dropping attachment support. While not ideal in all 
> cases, it is an extremely useful and reasonably popular feature.
> 
> Best
> Jan
> —
> 
>> On 28. Feb 2019, at 11:22, Robert Newson <rnew...@apache.org> wrote:
>> 
>> Hi All,
>> 
>> We've not yet discussed attachments in terms of the foundationdb work so 
>> here's where we do that.
>> 
>> Today, CouchDB allows you to store large binary values, stored as a series 
>> of much smaller chunks. These "attachments" cannot be indexed, they can only 
>> be sent and received (you can fetch the whole thing or you can fetch 
>> arbitrary subsets of them).
>> 
>> On the FDB side, we have a few constraints. A transaction cannot be more 
>> than 10MB and cannot take more than 5 seconds.
>> 
>> Given that, there are a few paths to attachment support going forward;
>> 
>> 1) Drop native attachment support. 
>> 
>> I suspect this is not going to be a popular approach but it's worth hearing 
>> a range of views. Instead of direct attachment support, a user could store 
>> the URL to the large binary content and could simply fetch that URL directly.
>> 
>> 2) Write attachments into FDB but with limits.
>> 
>> The next simplest is to write the attachments into FDB as a series of 
>> key/value entries, where the key is {database_name, doc_id, attachment_name, 
>> 0..N} and the value is a short byte array (say, 16K to match current). The 
>> 0..N is just a counter such that we can do an fdb range get / iterator to 
>> retrieve the attachment. An embellishment would restore the http Range 
>> header options, if we still wanted that (disclaimer: I implemented the Range 
>> thing many years ago, I'm happy to drop support if no one really cares for 
>> it in 2019).
>> 
>> This would be subject to the 10mb and 5s limit, which is less that you _can_ 
>> do today with attachments but not, in my opinion, any less that people 
>> actually do (with some notable outliers like npm in the past).
>> 
>> 3) Full functionality
>> 
>> This would be the same as today. Attachments of arbitrary size (up to the 
>> disk capacity of the fdb cluster). It would require some extra cleverness to 
>> work over multiple txn transactions and in such a way that an aborted upload 
>> doesn't leave partially uploaded data in fdb forever. I have not sat down 
>> and designed this yet, hence I would very much like to hear from the 
>> community as to which of these paths are sufficient.
>> 
>> -- 
>> Robert Samuel Newson
>> rnew...@apache.org
> 
> -- 
> Professional Support for Apache CouchDB:
> https://neighbourhood.ie/couchdb-support/
> 

Reply via email to