Hello pukkamustard, Interesting proposal, I can see the use for the verification capability. For my taste, the block size is much too small. I understand 4k can make sense for page tables and SATA, but looking at benchmarks 4k is still too small to maximize SATA throughput. I would also worry about 4k for a request size in any database or network protocol. The overheads per request are still too big for modern hardware. You could easily go to 8k, which could be justified with 9k jumbo frames for Ethernet and would at least also utilitze all of the bits in your paths. The 32k of ECRS are close to the 64k which are reportedly the optimum for modern M.2 media. IIRC Torrents even use 256k. The overhead from padding may be large for very small files if you go beyond 4k, but you should also think in terms of absolute overhead: even a 3100% overhead doesn't change the fact that the absolute overhead is tiny for a 1k file. Furthermore, you should consider a trick we use in GNUnet-FS, which is that we share *directories*, and for small files, we simply _inline_ the full file data in the meta data of the file that is stored with the directory or search result. So you can basically avoid having to ever download tiny files as separate entities, so for files <32k we have zero overhead this way.
I'd be curious to see how much the two pass encoding costs in practice -- it might be less expensive than ECRS if you are lucky (hashing one big block being cheaper than many small hash operations), or much more expensive if you are unlucky (have to actually read the data twice from disk). I am not sure that it is worth it merely to reduce the number of hashes/keys in the non-data blocks. Would be good to have some data on this, for various file sizes and platforms (to judge IO/RAM caching effects). As I said, I can't tell for sure if the 2nd pass is virtually free or quite expensive -- and that is an important detail. Especially with a larger block size, the overhead of an extra key in the non-data blocks could be quite acceptable. For 3.4 Namespaces, I would urge you to look at the GNU Name System (GNS). My plan is to (eventually, when I have way too much time and could actually re-do FS...) replace SBLOCKS and KBLOCKS of ECRS with basically only GNS. Anyway, please do keep us posted on major evolutions of the standard! I doubt we'll adopt with with 4k blocks, but if that changes, adding the verification capability wouldn't be a bad thing IMO. happy hacking! Christian On 7/10/20 8:59 AM, pukkamustard wrote: > > Hello GNUNet, > > I'd like to request feedback, questions and comments on an encoding of > content very much inspired by ECRS that I have been working on: Encoding > for Robust Immutable Storage (ERIS) > > https://openengiadina.net/papers/eris.html > > The motivation is to use the encoding in a social network like settings > where short messages and interactions are encoded using ERIS (as RDF > [1]). > > There is one major difference to ECRS (and a couple smaller ones) that I > would like to highlight: > > > ** Verification capability > > ERIS adds a verification capability. Holders of the verification > capability can enumerate all blocks required to decode the content and > verify integrity of the blocks without being able to decode the content. > > This enables peers to cache the entire content without being able to > read the content. > > The verification capability is enabled by using two keys: > > 1. A read key to encode the blocks holding content. > 2. A verification key (which is deterministically derived from the read > key) to encode the intermediary nodes of the Merkle tree. > > This makes the scheme slightly more complicated than ECRS and also > requires a two-pass encoding (when using convergent encryption). > > Nevertheless I believe this is a very important feature that maybe > results in a better privacy/complexity/availability trade-off as alluded > to in a previous thread > (https://lists.gnu.org/archive/html/gnunet-developers/2020-05/msg00015.html). > > > > ** Block size > > Block size is chosen to be 4kB. This an optimization towards small > content (short messages and social interactions). > > > ** URN > > Encoded content can be referred to by a URN making it usable from > existing Web (and RDF) settings. This could be added to ECRS. > > > ** No namespacing / keyword search > > There are currently no SBlock or KBlock like features. The idea is that > these features can be built on-top of the base encoding (including > SBlock and KBlock). > > > > We have a little JavaScript demo: > https://openengiadina.gitlab.io/js-eris/ . As well as implementation in > Guile [2]. > > I'd be very happy for your insight and feedback. > > Thanks! > > -pukkamustard > > > [1] https://openengiadina.net/papers/content-addressable-rdf.html > [2] https://gitlab.com/openengiadina/data-model/ > >
0x939E6BE1E29FC3CC.asc
Description: application/pgp-keys
signature.asc
Description: OpenPGP digital signature
