Hi Igno,
Thanks for the documentation update! Comments and questions follow.
> In the example above, we have 2 chunks X[A,B] and Y[C,D] and a root chunk 
> G[M,E]. The cutoff height H=1 and the root height R=3.
I'm a bit confused as to how this maps between implementation and storage. From 
the MMR document linked 
(https://github.com/opentimestamps/opentimestamps-server/blob/master/doc/merkle-mountain-range.md)
 it appears to me that the node labeled G would not be considered part of the 
stored set, but merely computed from the constituent direct children M and E. 
This appears necessary to ensure the append-only nature of the MMR, as addition 
of another leaf F would require G to change to become a set of [M, Z] where Z 
is the root of [E,F].
Is G then not intended to be stored, but rather merely symbolic of the root? 
And if this is the case, I think the documentation could use some clarity 
around what is stored and what is not (given that X and Y, being full trees, 
could be stored without issue). If not, doesn't this imply that the root chunk 
needs to be mutated all the way to the root itself on any addition?
---
Given that (correct me if I'm mistaken) the storage format of the MMR is not a 
consensus-critical parameter but rather a detail of the implementation, I'm 
inclined to avoid any unnecessary bike-shedding. However, is it worth 
considering an incomplete chunk, rather than adding incomplete subtrees of 
height less than H to the root chunk? In the insert case, there is always at 
most a single incomplete chunk, meaning that there is no difficult seeking 
involved in an addition. The downside to this approach is that the incomplete 
chunk may contain multiple peaks, but due to the design of the MMR these peaks 
should be easy to identify. On the other hand, the use of an incomplete chunk 
removes the need for a painful compaction process where a portion of the root 
chunk must be cleaved and streamed to a new file. As doing so lazily may result 
in either an undesirable sparse root chunk file or a rewrite of a significant 
portion of the root file (up to the 3.3MB you calculated for the size of a 
chunk), it seems you would want to do this as soon as the subtree reaches the 
target height, causing potentially significant pauses during the block commit 
process. In exchange for a bit of complexity on calculating the MMR root (once 
per block proposed), you gain strict append-only behavior for additions. This 
incomplete chunk could be replicated in memory, alleviating most of the 
concerns around identifying peaks and calculating the new root.
---
> Each object is one of two things: a commitment indicating an unspent output 
> or a NULL marker indicating a spent one. It is a sum-tree over all unspent 
> outputs (spent ones contribute nothing to the sum).
This is straying a bit from your topic of storage, but I could use a bit of 
clarification here: my understanding is that inserting is strictly append-only, 
but deletion touches every node above the spent leaf all the way to the root. 
Is this accurate? Eg. in the example tree in your document, if A is spent, X 
must change to commit only to B, changing M and in turn G. I'm basing this, in 
part, off of Peter Todd's TXO MMR proposal for bitcoin.
If this is the case, I think it should be made explicit, since it has 
repercussions for bootstrapping and validation. Consider if, on bootstrap, a 
new node receives cut-through block A covering the entirety of block history 
minus the cut-through horizon. If the UTXO commitment was a merkle-tree 
covering _only_ the unspent outputs, the root in A could be verified against 
the outputs contained in A, and the new node could be ready to append block A+1 
and validate its changes to the unspent set; the root of A+1 could be validated 
with only blocks A and A+1. The chain state committed to by the header of A 
consists solely of the explicit inputs (coinbase inputs) and outputs of A. (Of 
course this design has severe repercussions for efficiently adding or spending 
outputs...).
On the other hand, in (my understanding of) the MMR-based design, block A does 
not contain enough information itself to recreate and verify its root; it 
requires knowledge of the structure of the merkle tree, less any fully pruned 
nodes. This must be provided by bootstrapping nodes. In other words, the chain 
state committed to by the block header and covered by the proof-of-work no 
longer consists solely of the explicit inputs and outputs of the cut-through 
block, but in a way also depends on the cut-through outputs and the order in 
which they were added to the blockchain.
This is not intended as a criticism of the design but merely something that I 
think bears clarification.
Thanks for any clarification!
--MW

> -------- Original Message --------
> Subject: [Mimblewimble] Sum tree storage
> Local Time: July 24, 2017 11:03 AM
> UTC Time: July 24, 2017 6:03 PM
> From: [email protected]
> To: [email protected] <[email protected]>
> Hi all,
> Now that Merope has the sum tree design mostly figured out and implemented, 
> I've started thinking about how to handle and store it as it grows in size. 
> I've just pushed a draft of the design I'll be going for in the Merkle tree 
> doc:
> https://github.com/ignopeverell/grin/blob/master/doc/merkle.md#storage
> Reviews and comments are welcome.
> As a side note mostly for Merope, externalizing the positions index means 
> we'll need methods (prune, replace, etc.) on the tree that take the positions 
> instead of the element itself. This is actually simpler. For unit tests we 
> can have a simple wrapper that replicates the HashMap based index.
> - Igno
-- 
Mailing list: https://launchpad.net/~mimblewimble
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~mimblewimble
More help   : https://help.launchpad.net/ListHelp

Reply via email to