Re: [Bitcoin-development] BIP proposal: Authenticated prefix trees
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 01/06/2014 10:31 PM, Thomas Voegtlin wrote: You are right. The 256-way branching follows from the fact that the tree was implemented using a key-value database operating with byte strings (leveldb). With this implementation constraint, a different branching would probably be possible but wasteful. Not really. Just use a suffix to determine the number of bits used in the final key byte. For example, the string abc would have the key 0x61626308 // abc\x08 Dropping the final bit would mean masking it off and having a different terminating value: 0x61626207 // abb\x07 That way you keep the lexical ordering of keys necessary for database iteration, and the efficient binary encoding. I see the advantage of doing that, but this looks really far-fetched.. My understanding is that it would require a complete change in the way clients and miners work. Could such a change be brought iteratively? It is an iterative change, I believe. You might be confusing this idea with Peter Todd's TXO commitment proposal using MMR trees, which is a drastic change with its own set of tradeoffs. Just to be clear, here's what I'm proposing: 1) Restructure the current UTXO index to be a Merkle tree, basically by splitting coins into individual outputs and adding interior nodes to the leveldb database. 2) Add hash commitments of this structure to the coinbase. It's still mapping txid's to unspent outputs, just as before - this has nothing to do with the script keyed wallet index. It's just now nodes can prefix optional proofs to block or transaction messages which prove by reference to the current best block's hash the spend status of the inputs of a transaction, or all the inputs of all the transactions of a block. If the more expensive proof-updatable hashing is used, then these proofs can even be composed or rebased onto a new block by applying the contents of an operational proof representing the diff between two blocks / the application of a series of transactions. This means that a node which does not have access to the UTXO set can nevertheless receive transactions or entire blocks with prefixed proofs and check the validity of the transaction with just the information available (proof + transaction contents). All that is required after the above soft-fork is a protocol version update and/or a service bit to indicate the ability to send or receive proof-prefixed messages. I'd call that an incremental update. [Aside: adding the wallet index requires storing the entire UTXO set in duplicated form, indexed this time by scriptPubKey or H(scriptPubKey), and including proofs of this structure as well. It is unlikely that any soft-fork would occur forcing consensus over the wallet index, but it could be done as a meta-chain or as an index covering just the contents of the block.] -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.14 (GNU/Linux) Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBAgAGBQJSzKQ2AAoJEAdzVfsmodw4hyoQAJ0f6P3ijZCEw7IPd/RcrmkI Viv4j17ZyAAcbNUplvjzhr/tIIKYPg51ltvfkp8cGRHgez88QsljzvM8B5n+nbPa jaaI6eiJ3AU1bR8hWYKtlXFwMvRjyr3ofl8hhTvYptGv9x3/Tr+2FwxIRY0413m6 2h95vItsvBs8v7clqLoBEqx9uyUpsH3+J32V4oGubrNAFXh1oOHi4Ban+TOKYaQV GHZaIZ3bVAvcMd5riaWSPUPLHwJnxQ8w6SlVRy2UNUPe+9yTuy4n1GW4vk4WHvop FgZFrM3LBmh1MhlYHRdEUUtwk3mfDuGbfW5UJVMri0Nis1PsXr5VK4qQaMbd/9e6 M2uWKslY9QCnzMajnHen9OwotteAJy2I1KHVcxXb0tFqrvqZ6o/auIe0G4VdKYuI XfNF3mokX93tiSflmphDba6qgB/W+Y6UD2gG2AeFuMGhFF/Hy62pVC6Zx7PKZ3vL Kh27rKkO/0FJau2JCQm5xBiQgCnKghqOiHefY3o+l+Y9kJ8fXKWCuwJ0lJ3LxZ2u 8H6sp6Jm9Ct9L90wSn7VmmI5H3bRe8sa7sylH4BR2T6jP3/tKDYTEeNWj+F9FfO1 FxsjYrjAyv1HxYYKd/Y1svEVSsKMv3a2SR9pF36ynBABdFjvx+oEuCyCO4tspFe6 15eA1QoMKvEQe/Ww5kRC =L9WT -END PGP SIGNATURE- -- Rapidly troubleshoot problems before they affect your business. Most IT organizations don't have a clear picture of how application performance affects their revenue. With AppDynamics, you get 100% visibility into your Java,.NET, PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro! http://pubads.g.doubleclick.net/gampad/clk?id=84349831iu=/4140/ostg.clktrk ___ Bitcoin-development mailing list Bitcoin-development@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bitcoin-development
Re: [Bitcoin-development] BIP proposal: Authenticated prefix trees
On Sun, Jan 05, 2014 at 07:43:58PM +0100, Thomas Voegtlin wrote: Hello and happy new year to this mailing list! Thank you Mark for the incredible work you've been doing on this. I am following this very closely, because it is of primary importance for Electrum. I have written a Python-levelDB implementation of this UTXO hashtree, which is currently being tested, and will be added to Electrum servers. Along the lines of my recent post on blockchain data: Is it going to be possible to do partial prefix queries on that tree? Also have you considered creating per-block indexes of all scriptPubKeys, spent or unspent, queryable via the same partial prefix method? I too believe that BIPs should define interoperability points, but probably not implementation details. For the UTXO hashtree, this means that a BIP should at least specify how the root hash is constructed. This might be the only thing that needs to be specified. However, I see no pressing issue with writing a BIP; it might be preferable to implement and test different options first, and learn from that. It'd be very good to test this stuff thoroughly on Electrum first and get a feel for the performance and usability before any soft-fork to make it a miner commitment. Similarly a C++ implementation should be simply added to Bitcoin Core as a bloom filter replacement and made available over the P2P network. -- 'peter'[:-1]@petertodd.org 09bc28e08b41a74801c5878bf87978c2486aee7ed8a85778 signature.asc Description: Digital signature -- Rapidly troubleshoot problems before they affect your business. Most IT organizations don't have a clear picture of how application performance affects their revenue. With AppDynamics, you get 100% visibility into your Java,.NET, PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro! http://pubads.g.doubleclick.net/gampad/clk?id=84349831iu=/4140/ostg.clktrk___ Bitcoin-development mailing list Bitcoin-development@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bitcoin-development
Re: [Bitcoin-development] BIP proposal: Authenticated prefix trees
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 01/06/2014 10:13 AM, Peter Todd wrote: On Sun, Jan 05, 2014 at 07:43:58PM +0100, Thomas Voegtlin wrote: I have written a Python-levelDB implementation of this UTXO hashtree, which is currently being tested, and will be added to Electrum servers. Along the lines of my recent post on blockchain data: Is it going to be possible to do partial prefix queries on that tree? There's really two tree structures being talked about here. Correct me if I'm wrong Thomas, but last time I looked at your code it was still implementing a 256-way PATRICIA trie, correct? This structure lends itself to indexing either scriptPubKey or H(scriptPubKey) with approximately the same performance characteristics, and in the Ultimate blockchain compression thread there is much debate about which to use. In the process of experimentation I've since moved from a 256-way PATRICIA trie to a bitwise, non-level-compressed trie structure - what I call proof-updatable trees in the BIP. These have the advantage of allowing stateless application of one proof to another, and as consequence enable mining mempool operations without access to the UTXO set, so long as proofs are initially provided in the transaction block wire format. The disadvantage is that performance is closely tied to key length, making H(scriptPubKey) the much more desirable option. I'm sure you see that as an advantage, however :) Also have you considered creating per-block indexes of all scriptPubKeys, spent or unspent, queryable via the same partial prefix method? This would be quite easy to do, separate from the UTXO structure but using the same trie format. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.14 (GNU/Linux) Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBAgAGBQJSy0iFAAoJEAdzVfsmodw434MQAIA/fDYT7SfMtfLEgDQKhXCn slRqFEx/HXjvgHHSYnbr9V+8LrGzNvT2ImebbV9ge8VlziAFNGIUq2EYhFs4kHWu GVm9aL8Jj/27SvM0tRwr9n2XIifKOh2sVINAjbv+UwPv/O+cULU95/b53DEF6aqI OWxioOR50TPe4t9AevAGVypNLm1DsyDdymhO9xyBN92xGTNj5QKL5hHG3kcsLIl1 7KaxO0w4UC2sdSGj9FeyH1b0zYg8FlzjJHc1CUshHwUwyYo8LRJtRypL5lrayERg Er/kIGEDovcenNBW8G79l+8VKPfB/lMTssT2pDiQL+1e1fg46CIQxHSyap2JSFTE jgleRk/+1NK/ZjOQ8dEBPZK3TE1WY3qlm/ekjG/8W5kXqcxzFBoAkeBNXuJ/8UMi mKe+DTmbp0xnvLO1p+hpugXKfrQSpcFL+ZvJHlFS1lz7O1N3WvuDCNP9El+L6ueM nFzjr1NTnX0z4vYtscI7qBKVqUrB7Z84c3O/lSYpw4Jilxl4trzV4cn7+AF7KWGM ktR9JJeIoNcJ2Zx4EpRp6OSwhtLkWZyLpPnidQ2p6ev2ytXpTpGsW/i5XS2w57UD 2IG5E0Q7Xzvd58lI/YollWQcagVOZdyzYXa+wVZoFQ6gLF47andpUmtUJOhI7gxv T/rWhPhkTMUn8TdvUcV/ =N9zM -END PGP SIGNATURE- -- Rapidly troubleshoot problems before they affect your business. Most IT organizations don't have a clear picture of how application performance affects their revenue. With AppDynamics, you get 100% visibility into your Java,.NET, PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro! http://pubads.g.doubleclick.net/gampad/clk?id=84349831iu=/4140/ostg.clktrk ___ Bitcoin-development mailing list Bitcoin-development@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bitcoin-development
Re: [Bitcoin-development] BIP proposal: Authenticated prefix trees
Le 07/01/2014 01:21, Mark Friedenbach a écrit : -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 01/06/2014 10:13 AM, Peter Todd wrote: On Sun, Jan 05, 2014 at 07:43:58PM +0100, Thomas Voegtlin wrote: I have written a Python-levelDB implementation of this UTXO hashtree, which is currently being tested, and will be added to Electrum servers. Along the lines of my recent post on blockchain data: Is it going to be possible to do partial prefix queries on that tree? There's really two tree structures being talked about here. Correct me if I'm wrong Thomas, but last time I looked at your code it was still implementing a 256-way PATRICIA trie, correct? This structure lends itself to indexing either scriptPubKey or H(scriptPubKey) with approximately the same performance characteristics, and in the Ultimate blockchain compression thread there is much debate about which to use. You are right. The 256-way branching follows from the fact that the tree was implemented using a key-value database operating with byte strings (leveldb). With this implementation constraint, a different branching would probably be possible but wasteful. My recent code creates one leaf per unspent, and uses 56-byte keys built as: H(scriptPubKey) + txid + txpos (This is not pushed yet, it needs cleanup. Previous code created one leaf per address) Partial prefix queries are possible with database iterators. In the process of experimentation I've since moved from a 256-way PATRICIA trie to a bitwise, non-level-compressed trie structure - what I call proof-updatable trees in the BIP. These have the advantage of allowing stateless application of one proof to another, and as consequence enable mining mempool operations without access to the UTXO set, so long as proofs are initially provided in the transaction block wire format. I see the advantage of doing that, but this looks really far-fetched.. My understanding is that it would require a complete change in the way clients and miners work. Could such a change be brought iteratively? -- Rapidly troubleshoot problems before they affect your business. Most IT organizations don't have a clear picture of how application performance affects their revenue. With AppDynamics, you get 100% visibility into your Java,.NET, PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro! http://pubads.g.doubleclick.net/gampad/clk?id=84349831iu=/4140/ostg.clktrk ___ Bitcoin-development mailing list Bitcoin-development@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bitcoin-development
Re: [Bitcoin-development] BIP proposal: Authenticated prefix trees
Hello and happy new year to this mailing list! Thank you Mark for the incredible work you've been doing on this. I am following this very closely, because it is of primary importance for Electrum. I have written a Python-levelDB implementation of this UTXO hashtree, which is currently being tested, and will be added to Electrum servers. My implementation follows Alan Reiner's idea to store the tree as items in a key-value database. I believe that a C++ implementation like yours will be at least an order of magnitude faster, and I am looking forward to it. I too believe that BIPs should define interoperability points, but probably not implementation details. For the UTXO hashtree, this means that a BIP should at least specify how the root hash is constructed. This might be the only thing that needs to be specified. However, I see no pressing issue with writing a BIP; it might be preferable to implement and test different options first, and learn from that. Thomas Le 20/12/2013 02:47, Mark Friedenbach a écrit : -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hello fellow bitcoin developers. Included below is the first draft of a BIP for a new Merkle-compressed data structure. The need for this data structure arose out of the misnamed Ultimate blockchain compression project, but it has since been recognized to have many other applications. In addition to this BIP I am preparing three additional BIPs describing the use of this data structure in stateless validation mining, the UBC address index for SPV+ operating modes, document timestamping and merged mining. A Python implementation of this data structure is available here: https://github.com/monetizeio/python-bitcoin A C++ implementation is being worked on. As per the BIP-1 procedure, I am submitting this rough draft to the community for discussion. I welcome all comments and criticisms of both form and content. - -Mark ==Abstract== This BIP describes a [http://en.wikipedia.org/wiki/Hash_tree Merkle hash tree] variant of the [http://en.wikipedia.org/wiki/Trie prefix-tree data structure], ideally suited for encoding key-value indices which support memory-efficient proofs. ==Motivation== There are a number of applications which would benefit from having a data structure with the following properties: * '''Arbitrary mapping of keys to values.''' A ''key'' can be any bytestring, and its ''value'' any other bytestring. * '''Duplicate keys disallowed.''' Every key has one, and only one value associated with it. Some applications demand assurance that no key value is reused, and that this constraint can be checked without requiring access to the entire data structure. * '''Efficient look-up by key.''' The data structure should support sub-linear lookup operations with respect to the number of keys in the mapping. Logarithmic time or linear with respect to the length of the key should be achievable and would be sufficient for realistic applications. * '''Merkle compression of mapping structure.''' It should be possible to produce a reduced description of the tree consisting of a single root hash value which is deterministically calculated from the mapping structure. * '''Efficient proofs of inclusion.''' It should be possible to extract a proof of key/value mapping which is limited in size and verification time by the length of the key in the worst case. * '''Computation of updates using local information.''' Given a set of inclusion proofs, it should be possible to calculate adjustments to the local mapping structure (update or deletion of included mappings, or insertion between two included mappings which are adjacent in the global structure). Such applications include committed validation indices which enable stateless mining nodes, committed wallet indices which enable trust-less querying of the unspent transaction output set by codescriptPubKey/code, efficient document time-stamping, and secure efficient merged mining. This BIP describes an authenticated prefix tree which has the above properties, but leaves the myriad applications to be formalized in future BIPs. ==Data structure== This BIP defines a binary prefix tree. Such a structure provides a mapping of bitstrings (the ''keys'') to bytestrings (the ''values''). It is an acyclic binary tree which implicitly encodes keys within the traversal path -- a left branch is a 0, and a right branch is a 1. Each node is reachable by only one unique path, and reading off the branches taken (0 for each left, 1 for each right) as one follows the path from root to target yields the node's key. The particular binary prefix tree defined by this BIP is a hybrid PATRICIA / de la Brandais tree structure. [http://en.wikipedia.org/wiki/Radix_tree PATRICIA trees] compress a long sequence of non-branching nodes into a single interior node with a per-branch ''skip prefix''. This achieves significant savings in storage space, root
Re: [Bitcoin-development] BIP proposal: Authenticated prefix trees
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi Jeremy, Let's give a preview of the application-oriented BIPs I mentioned: Stateless validation and mining involves prefixing transaction and block messages with proofs of their UTxO state changes. These are the operational proofs I describe in the draft, and they work on prefix trees whose root hashes committed to the coinbase in a soft-fork upgrade of the validation rules. Ultimate blockchain compression involves consensus over an address index, which can be queried over the p2p network by lightweight nodes. The structure of the index is an authenticated prefix tree, and the results of such a query is an an inclusion proof. Document time-stamping and this new method of merged mining use the same structure: a prefix tree whose root hash value is committed to a pruneable output of the coinbase transaction. Document timestamp proofs and merged mining proof-of-works are inclusion proofs over this tree. I hope that shows how the BIP directly affects interoperability of the bitcoin protocol and clients which use these applications. I released this BIP first to get some feedback on the structure itself, which will be used by all of the application-specific BIPs which follow. Stepping back and speaking generically, the purpose of a BIP as I see it is to standardize details which affect interoperability between clients. In fact, at a cursory glance only about half of the BIPs deal with protocol issues directly - the rest deal with local / user-interface issues like key derivation or JSON-RPC APIs. Even if none of the applications involved protocol changes, I still think BIPs like this would be of value in that they serve to standardize things which are or will seek to become commonly used and widely implemented. Cheers, Mark On 12/19/2013 10:48 PM, Jeremy Spilman wrote: Wow there's a lot here to think about. I'm pretty sure I haven't grasped the full implications yet. I see it proposes to also introduce additional BIPs describing the use of the data stucture for stateless validation mining, the UBC address index for SPV+ operating modes, document timestamping and merged mining. Can the BIP stand alone as a BIP without some specific changes to the protocol or end-user accessible features defined within it? It seems like an extremely useful data stucture, but as I understand it the purpose of BIPS is defining interoperability points, not implementation details? Unless the tree itself is becoming part of the protocol, seems like its spec, test vectors, and reference implementation can live elsewhere, but I would love to read about BIPS which use this tree to accomplish some amazing scalability or security benefits. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.14 (GNU/Linux) Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBAgAGBQJStChCAAoJEAdzVfsmodw42DcQAIlgkukh5K/XYloIiT5pgaHS xCZXtOvxpNUep8x35rvEO1ePjvPvUkbUE2jRw2se1rSMkhzw3PpHHtXV/gIOGqUe WVKeeIM5pZX56sEcEdUQ1pTwB2rmtSNeyCuHl8fLatk8eLhcAHcpv/7esLuAjWCr EE840s8+q3ltuzKi3nqxK84bwIohgSMKhncfonNp5lMAtug8Itqopq3DPDoxwiK/ qUwSz5UCEMH6oNHnywzhKGUhBErqo4q8IgAKcZYBZZ9n4BRjf4ngoCw9n5wCef8v tyTvwrg0nSQTQa67cg7RCsY7SisrI9gaMvCYTSvEMKdw9X0aqAX1p0yZpTbV+dIr Q2ZT6gJmg2sD22zKY1/58oq+PiNO+nRS81OG2znZofsIfhOVW0bIZAQ8+zZtFW40 vXxMuHCNieCK8e7f9A6LLv/Zz7rmNxdQ6cHBEL1nIs1Y4d1FpHJVI2LHi54QmVXf C5PKF/e7K2eD3LZMNxS818rZaiJJ7qmpjS3rkG2owHyJHEhBJIlkYXfI1YCraT+b R5AzAh2Oz0Nyb5ChP2VSaecJNjGvRMo7Z6HCytmgAGOEcDDZkxSv0kkprbvchqXx XziFgA4iSajBKYWPiPLGMADfMPT6zd4fhDjyaN8+LPO3d3ZK1VwmQDLRQ3DxfeIP RgchHR/pS77XI7hCFwtN =ao17 -END PGP SIGNATURE- -- Rapidly troubleshoot problems before they affect your business. Most IT organizations don't have a clear picture of how application performance affects their revenue. With AppDynamics, you get 100% visibility into your Java,.NET, PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro! http://pubads.g.doubleclick.net/gampad/clk?id=84349831iu=/4140/ostg.clktrk ___ Bitcoin-development mailing list Bitcoin-development@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bitcoin-development
Re: [Bitcoin-development] BIP proposal: Authenticated prefix trees
On Fri, Dec 20, 2013 at 03:10:50AM -0800, Mark Friedenbach wrote: On 12/20/2013 02:48 AM, Peter Todd wrote: On Thu, Dec 19, 2013 at 05:47:52PM -0800, Mark Friedenbach wrote: This BIP describes the authenticated prefix tree and its many variations in terms of its serialized representation. Additional BIPs describe the application of authenticated prefix trees to such applications as committed indices, document time-stamping, and merged mining. Could you expand more on how prefix trees could be used for time-stamping and merged mining? The root hash of a prefix tree is placed in the coinbase at a location standardized by convention. Right, last txout in an OP_RETURN like we discussed. For document time-stamping, the key can be the hash of the document. Don't you mean the value is the hash of the document and the key is irrelevant? For merged mining, the key is the hash of the genesis block of the altchain, and the value is the hash of the aux-pow (for p2pool, the share hash). What's the advantage over the direction-based system I proposed before? Seems to me the code required to validate the proof is significantly more complex in your scheme. http://www.mail-archive.com/bitcoin-development@lists.sourceforge.net/msg03149.html In the system I have in mind this adds 43 bytes to the coinbase transaction, By 43 bytes you mean the whole op_return txout right? dict = AuthTree() dict['Curie'] = VARINT(1898) dict('Einstein') = VARINT(1905) dict['Fleming'] = VARINT(1928) dict['中本'] = VARINT(2009) I'd be inclined to leave the unicode out of the code examples as many editors and shells still don't copy-and-paste it nicely. Using it in BIP documents themselves is fine and often has advantages re: typesetting, but using it in crypto examples like this just makes it harder to reproduce the results by hand unnecessarily. Thanks for the feedback, I rather agree. When I was creating that example for some reason I wanted the right branch of the root node to be used, which is difficult when only 7-bit ASCII keys are used. But I don't think the illustrative point I had in mind ended up being particularly relevant, so I'll rework it. That example is python, so I'd suggest just using escape sequences myself. You probably also should include the b prefix to make the strings explicitly binary for py3 compatibility, ie dict[b'\xbe\xef'] -- 'peter'[:-1]@petertodd.org 000216e3750a9ad9584395352d728a3c543844eab3bfc9ce1073 signature.asc Description: Digital signature -- Rapidly troubleshoot problems before they affect your business. Most IT organizations don't have a clear picture of how application performance affects their revenue. With AppDynamics, you get 100% visibility into your Java,.NET, PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro! http://pubads.g.doubleclick.net/gampad/clk?id=84349831iu=/4140/ostg.clktrk___ Bitcoin-development mailing list Bitcoin-development@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bitcoin-development
Re: [Bitcoin-development] BIP proposal: Authenticated prefix trees
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 (Sorry Peter, this was meant for the whole list:) On 12/20/2013 05:17 AM, Peter Todd wrote: I've thought about this for awhile and come to the conclusion that UTXO commitments are a really bad idea. I myself wanted to see them implemented about a year ago for fidelity bonded banks, but I've changed my mind and I hope you do too. They force miners and every full node with SPV clients to store the entire UTXO set in perpetuity. This is incorrect. If the slower proof-updatable hashes are used, then mining only requires what I've called operational proofs to be attached to received transactions and blocks. Access to the UTXO set is required to make new transactions, at least for the outputs of the transaction, but I do not believe this is as significant a problem as you do. It is a service that can be outsourced for a minimal fee - include an explicit output of the necessary amount to a scriptPubKey specified by the archival node, and they will make sure the proper proofs are attached. This is bad by itself, but then they make it even worse by making Bitcoin really useful and convenient to use as a decentralized database; UTXO commitments make it easy and convenient to implement systems like Namecoin on top of Bitcoin, yet we don't have the UTXO expiration that might make such uses reasonable. Right now the UTXO set is reasonable small - ~300MB - but that can and will change if we make it an attractive way to store data. UTXO commitments do exactly that. You might have to explain this to me, but it is not clear to me how the validation index could be twisted into providing a Namecoin-like system. Or the address index either, which I presume is what you are referring to. Namecoin works by assigning domains to outputs, and then tracking ownership and configuration of that domain through chains of outputs. But the UTXO set doesn't contain connecting information. At best all it would be is a glorified, and expensive time-stamper, unattractive because there are already better solutions. You're also *not* giving users what they actually want: the transactions associated with their wallets. Even though Electrum could easily work via a pure UTXO database they implemented transaction lookup instead; Electrum servers cough up every transaction associated with a user's wallet. If you're going to do that, it's just as easy to do per-block lookup trees which don't force the UTXO set to be stored. At the cost of having the supposedly lightweight client query for each of its coins on every single block, to construct a negative proof-of-spend. There's also a more subtle issue: the security model of UTXO commitments sucks. It encourages wallets to essentially trust single confirmations because it's unlikely that nodes will want to store the multiple copies of the UTXO set required to provide proof of multiple confirmations. Basically the issue is when you start your wallet you get a proof of UTXO set for the most recent block; that's just one confirmation. To get more confirmations you have to wait for subsequent blocks, checking the set on each block. Per block indexes on the other hand naturally lead wallets to count confirmations properly. I don't think this is true, or at least you are not considering available optimizations. You certainly don't need to store multiple copies of the UTXO set. I'm a little confused as to the exact situation you are describing. When a key is loaded into a wallet, or a wallet comes online after a significant absence, it looks for coins in the current UTXO set. If any coins are found, their attached transaction record has a block height field, so the confirmation count can be derived from that. As blocks go by that count is naturally increased. I'm not sure how this is different from the current situation. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.14 (GNU/Linux) Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBAgAGBQJStI9aAAoJEAdzVfsmodw4IooP/1uK9cvP1vxXyQRbAHf9oFXw AmZ8p1+t8f6MHUpjkv/Xn0poFNU8qSnNz65drQdq8ErcJnqe4V3Wt6G32/uCxvZs 6AX6bRYQIfhHY0DBPgfacO5/ALdlnS4NdjWFCA5hHDgLd30BpbU1WK1ze985TXrd +ucQkzcMYEDW2lb+sFvfhpi9ZPFd34ZrNzH//oS794eYKWAmj7jXqdgxk5AKat61 Xileq5beE4xom3pChXc3PtIJKsoil5SjE20/FW52wcCdyaEFG0kwl937pEGjQnlP mylK/ilfZ6cvRC8MmVnl/6AC4V2hjB4Ncej03jG3JI2FdaJEOHuHg0uh8/Zl1I4A YVIKyrHQhQb/VGsfXtW3zokHzDeEtJwlx+PPFaLc9QurFirNjSnenhbw4Vpbg3Xt dH1Qd9xWcT85a19Oz8Q4rt3z7UmX9J/geZrUHCuPtr47yXU0e1Cc6ZP9zDYNtfKU q6MjNZiaLJ/Wp0n4IeQ/4/wqy0rM/psP9i5d6IdP96tayVM9aKj5Lh9lU/Od5wGO 2PPB4kvhJfMbx3o+S7UK5vra7ysZzULpoVGDpUR3xRM72l//vlNhSLK5nILVO3r8 sIC5+3WoZLUKvuNo45/BDxXHZajrWLCU84WrwHVm1u7SHfBQcoES/rhcx2zlgyx0 /Iwxsgb7Fznl+eM2bEpZ =TtaV -END PGP SIGNATURE- -- Rapidly troubleshoot problems before they affect your business. Most IT organizations
Re: [Bitcoin-development] BIP proposal: Authenticated prefix trees
On Thu, Dec 19, 2013 at 5:47 PM, Mark Friedenbach m...@monetize.io wrote: Hello fellow bitcoin developers. Included below is the first draft of a BIP for a new Merkle-compressed data structure. The need for this data structure arose out of the misnamed Ultimate blockchain compression project, but it has since been recognized to have many other applications. A couple very early comments— I shared some of these with you on IRC but I thought I'd post them to make them more likely to not get lost. Whats a VARCHAR() A zero terminated string? A length prefixed string? How is the length encoded? Hopefully not in a way that has redundancy, since things that don't survive a serialization round trip is a major trap. Is the 'middle' the best place for the extradata? Have you contemplated the possibility that some applications might use midstate compression? On that general subject, since the structure here pretty much always guarantees two compression function invocations. SHA512/256 might actually be faster in this application. Re: using sha256 instead of sha256^2, we need to think carefully about the implications of Merkle-Damgard generic length extension attacks. It would be unfortunately to introduce them here, even though they're currently mostly theoretical for sha256. WRT hash function performance, hash functions are so ludicrously fast (and will be more so as processors get SHA2 instructions) that the performance of the raw compression function would hardly ever be a performance consideration unless you're using a slow interpreted language (... and that sounds like a personal problem to me). So I don't think CPU performance should be a major consideration in this BIP. What I do think should be a consideration is the cost of validating the structure under a zero-knowledge proof. An example application is a blind proof for a SIN or a proof of how much coin you control... or even a proof that a block was a correctly validated one, and in these cases additional compression function calls are indeed pretty expensive. But they're not the only cost, any conditional logic in the hash tree evaluation is expensive, and particular, I think that any place where data from children will be combined with a variable offset (especially if its not word aligned) would potentially be rather expensive. I'm unconvinced about the prefix tree compressed applications, since they break compact update proofs. If we used them in the Bitcoin network they could only be used for data where all verifying nodes had all their data under the tree. I think they add a lot of complexity to the BIP (esp from people reading the wrong section), so perhaps they should be split into another document? In any case, I want to thank you for talking the time to write this up. You've been working on this stuff for a while and I think it will be lead to useful results, even if we don't end up using how it was originally envisioned. -- Rapidly troubleshoot problems before they affect your business. Most IT organizations don't have a clear picture of how application performance affects their revenue. With AppDynamics, you get 100% visibility into your Java,.NET, PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro! http://pubads.g.doubleclick.net/gampad/clk?id=84349831iu=/4140/ostg.clktrk ___ Bitcoin-development mailing list Bitcoin-development@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bitcoin-development
Re: [Bitcoin-development] BIP proposal: Authenticated prefix trees
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 12/20/2013 11:48 AM, Gregory Maxwell wrote: A couple very early comments— I shared some of these with you on IRC but I thought I'd post them to make them more likely to not get lost. I got the inputs from IRC, but thank you for posting to the list so that others can see and review. Whats a VARCHAR() A zero terminated string? A length prefixed string? How is the length encoded? Hopefully not in a way that has redundancy, since things that don't survive a serialization round trip is a major trap. A length-prefixed string, using the shortest representation VARINT for the length. Same as how scripts are serialized in transactions. Is the 'middle' the best place for the extradata? Have you contemplated the possibility that some applications might use midstate compression? Yes I considered midstate compression which is why the branch hashes come last, but extra was an oversight. In every application I've considered it's either not used (and therefore a single byte), or updated whenever the node or its children updates. Honestly I don't expect midestate compression to offer much since in the nodes that are updated frequently it is unlikely that there will be enough static data at the front to fill even a 512 bit block of the smaller hash function. But it doesn't hurt to prepare just in case. I'll move it to the end. On that general subject, since the structure here pretty much always guarantees two compression function invocations. SHA512/256 might actually be faster in this application. Yes, this is a great suggestion. Moving to SHA-512/256 will let most inner nodes fit inside a single block, so long as the extra field is not too long. Also apparently SHA-512 is faster on 64-bit CPUs, which is a nice advantage. I didn't know that. I'm concerned about speed but I did not go with a faster hash function because users are more likely to have hardware acceleration for the SHA-2 family. Re: using sha256 instead of sha256^2, we need to think carefully about the implications of Merkle-Damgard generic length extension attacks. It would be unfortunately to introduce them here, even though they're currently mostly theoretical for sha256. The serialization format encodes lengths in such a way that you cannot extend the data structure merely by appending bits. You would have to change the prior, already hashed bits as well. I believe this makes it immune to length extension attacks. WRT hash function performance, hash functions are so ludicrously fast (and will be more so as processors get SHA2 instructions) that the performance of the raw compression function would hardly ever be a performance consideration unless you're using a slow interpreted language (... and that sounds like a personal problem to me). So I don't think CPU performance should be a major consideration in this BIP. Well.. the UTXO tree is big. Let's assume 5,000 transactions per block, with an average of 3 inputs/outputs per transaction. This is close to the worst-case scenario with the current block size. That's 15,000 insert, update, or delete operations. The number of hashes required when level-compression is used is log2 the number of items in the tree, which for bitcoin is currently about 2.5 million transactions. So that's about ~21 hashes per input/ouput, or 315,000 hash operations. A CPU is able to do about 100,000 hashes per second per core, that'll probably take about a second on a modern 4- or 8-core machine. For updatable proofs, the number of hash operations is equal to the number of bits in the key, which for the validation index is always 256. That means 3.84 million hashes, or about 10 seconds on a 4-core machine. The numbers for the wallet index are worse, as it scales with the number of outputs, which is necessarily larger, and the keys are longer. This is not an insignificant cost in the near term, although it is the type of operation that could be easily offloaded to a GPU or FPGA. What I do think should be a consideration is the cost of validating the structure under a zero-knowledge proof. An example application is a blind proof for a SIN or a proof of how much coin you control... or even a proof that a block was a correctly validated one, and in these cases additional compression function calls are indeed pretty expensive. But they're not the only cost, any conditional logic in the hash tree evaluation is expensive, and particular, I think that any place where data from children will be combined with a variable offset (especially if its not word aligned) would potentially be rather expensive. This is something I know less about, and I welcome constructive input. There is *no* reason that the hash serialization needs to have fancy space-saving features. You could even make the SIG_HASH node serialization into fixed-size, word-aligned data structures. But this is absolutely not my field, and I may need some hand-holding. Do
[Bitcoin-development] BIP proposal: Authenticated prefix trees
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hello fellow bitcoin developers. Included below is the first draft of a BIP for a new Merkle-compressed data structure. The need for this data structure arose out of the misnamed Ultimate blockchain compression project, but it has since been recognized to have many other applications. In addition to this BIP I am preparing three additional BIPs describing the use of this data structure in stateless validation mining, the UBC address index for SPV+ operating modes, document timestamping and merged mining. A Python implementation of this data structure is available here: https://github.com/monetizeio/python-bitcoin A C++ implementation is being worked on. As per the BIP-1 procedure, I am submitting this rough draft to the community for discussion. I welcome all comments and criticisms of both form and content. - -Mark ==Abstract== This BIP describes a [http://en.wikipedia.org/wiki/Hash_tree Merkle hash tree] variant of the [http://en.wikipedia.org/wiki/Trie prefix-tree data structure], ideally suited for encoding key-value indices which support memory-efficient proofs. ==Motivation== There are a number of applications which would benefit from having a data structure with the following properties: * '''Arbitrary mapping of keys to values.''' A ''key'' can be any bytestring, and its ''value'' any other bytestring. * '''Duplicate keys disallowed.''' Every key has one, and only one value associated with it. Some applications demand assurance that no key value is reused, and that this constraint can be checked without requiring access to the entire data structure. * '''Efficient look-up by key.''' The data structure should support sub-linear lookup operations with respect to the number of keys in the mapping. Logarithmic time or linear with respect to the length of the key should be achievable and would be sufficient for realistic applications. * '''Merkle compression of mapping structure.''' It should be possible to produce a reduced description of the tree consisting of a single root hash value which is deterministically calculated from the mapping structure. * '''Efficient proofs of inclusion.''' It should be possible to extract a proof of key/value mapping which is limited in size and verification time by the length of the key in the worst case. * '''Computation of updates using local information.''' Given a set of inclusion proofs, it should be possible to calculate adjustments to the local mapping structure (update or deletion of included mappings, or insertion between two included mappings which are adjacent in the global structure). Such applications include committed validation indices which enable stateless mining nodes, committed wallet indices which enable trust-less querying of the unspent transaction output set by codescriptPubKey/code, efficient document time-stamping, and secure efficient merged mining. This BIP describes an authenticated prefix tree which has the above properties, but leaves the myriad applications to be formalized in future BIPs. ==Data structure== This BIP defines a binary prefix tree. Such a structure provides a mapping of bitstrings (the ''keys'') to bytestrings (the ''values''). It is an acyclic binary tree which implicitly encodes keys within the traversal path -- a left branch is a 0, and a right branch is a 1. Each node is reachable by only one unique path, and reading off the branches taken (0 for each left, 1 for each right) as one follows the path from root to target yields the node's key. The particular binary prefix tree defined by this BIP is a hybrid PATRICIA / de la Brandais tree structure. [http://en.wikipedia.org/wiki/Radix_tree PATRICIA trees] compress a long sequence of non-branching nodes into a single interior node with a per-branch ''skip prefix''. This achieves significant savings in storage space, root hash calculation, and traversal time. A de la Brandais trie achieves compression by only storing branches actually taken in a node. The space savings are minimal for a binary tree, but place the serialized size of a non-branching interior node under the SHA-256 block size, thereby reducing the number of hash operations required to perform updates and validate proofs. This BIP describes the authenticated prefix tree and its many variations in terms of its serialized representation. Additional BIPs describe the application of authenticated prefix trees to such applications as committed indices, document time-stamping, and merged mining. ==Serialization format== As a hierarchical structure, the serialization of an entire tree is the serialization of its root node. A serialized node is the concatenation of five structures: node := flags || VARCHAR(extra) || value || left || right The codeflags/code is a single byte field whose composite values determine the bytes that follow. flags = (left_flags 0) | (right_flags 2) | (has_value4) |