Re: [Bitcoin-development] BIP proposal: Authenticated prefix trees

2014-01-07 Thread Mark Friedenbach
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 01/06/2014 10:31 PM, Thomas Voegtlin wrote:
 You are right. The 256-way branching follows from the fact that the
 tree was implemented using a key-value database operating with byte
 strings (leveldb). With this implementation constraint, a different
 branching would probably be possible but wasteful.

Not really. Just use a suffix to determine the number of bits used in
the final key byte. For example, the string abc would have the key

0x61626308 // abc\x08

Dropping the final bit would mean masking it off and having a
different terminating value:

0x61626207 // abb\x07

That way you keep the lexical ordering of keys necessary for database
iteration, and the efficient binary encoding.

 I see the advantage of doing that, but this looks really
 far-fetched.. My understanding is that it would require a complete
 change in the way clients and miners work. Could such a change be
 brought iteratively?

It is an iterative change, I believe. You might be confusing this idea
with Peter Todd's TXO commitment proposal using MMR trees, which is a
drastic change with its own set of tradeoffs. Just to be clear, here's
what I'm proposing:

1) Restructure the current UTXO index to be a Merkle tree, basically
by splitting coins into individual outputs and adding interior nodes
to the leveldb database.

2) Add hash commitments of this structure to the coinbase.

It's still mapping txid's to unspent outputs, just as before - this
has nothing to do with the script keyed wallet index. It's just now
nodes can prefix optional proofs to block or transaction messages
which prove by reference to the current best block's hash the spend
status of the inputs of a transaction, or all the inputs of all the
transactions of a block.

If the more expensive proof-updatable hashing is used, then these
proofs can even be composed or rebased onto a new block by applying
the contents of an operational proof representing the diff between
two blocks / the application of a series of transactions.

This means that a node which does not have access to the UTXO set can
nevertheless receive transactions or entire blocks with prefixed
proofs and check the validity of the transaction with just the
information available (proof + transaction contents).

All that is required after the above soft-fork is a protocol version
update and/or a service bit to indicate the ability to send or receive
proof-prefixed messages. I'd call that an incremental update.

[Aside: adding the wallet index requires storing the entire UTXO set
in duplicated form, indexed this time by scriptPubKey or
H(scriptPubKey), and including proofs of this structure as well. It is
unlikely that any soft-fork would occur forcing consensus over the
wallet index, but it could be done as a meta-chain or as an index
covering just the contents of the block.]
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJSzKQ2AAoJEAdzVfsmodw4hyoQAJ0f6P3ijZCEw7IPd/RcrmkI
Viv4j17ZyAAcbNUplvjzhr/tIIKYPg51ltvfkp8cGRHgez88QsljzvM8B5n+nbPa
jaaI6eiJ3AU1bR8hWYKtlXFwMvRjyr3ofl8hhTvYptGv9x3/Tr+2FwxIRY0413m6
2h95vItsvBs8v7clqLoBEqx9uyUpsH3+J32V4oGubrNAFXh1oOHi4Ban+TOKYaQV
GHZaIZ3bVAvcMd5riaWSPUPLHwJnxQ8w6SlVRy2UNUPe+9yTuy4n1GW4vk4WHvop
FgZFrM3LBmh1MhlYHRdEUUtwk3mfDuGbfW5UJVMri0Nis1PsXr5VK4qQaMbd/9e6
M2uWKslY9QCnzMajnHen9OwotteAJy2I1KHVcxXb0tFqrvqZ6o/auIe0G4VdKYuI
XfNF3mokX93tiSflmphDba6qgB/W+Y6UD2gG2AeFuMGhFF/Hy62pVC6Zx7PKZ3vL
Kh27rKkO/0FJau2JCQm5xBiQgCnKghqOiHefY3o+l+Y9kJ8fXKWCuwJ0lJ3LxZ2u
8H6sp6Jm9Ct9L90wSn7VmmI5H3bRe8sa7sylH4BR2T6jP3/tKDYTEeNWj+F9FfO1
FxsjYrjAyv1HxYYKd/Y1svEVSsKMv3a2SR9pF36ynBABdFjvx+oEuCyCO4tspFe6
15eA1QoMKvEQe/Ww5kRC
=L9WT
-END PGP SIGNATURE-

--
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET,  PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831iu=/4140/ostg.clktrk
___
Bitcoin-development mailing list
Bitcoin-development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-development


Re: [Bitcoin-development] BIP proposal: Authenticated prefix trees

2014-01-06 Thread Peter Todd
On Sun, Jan 05, 2014 at 07:43:58PM +0100, Thomas Voegtlin wrote:
 Hello and happy new year to this mailing list!
 
 
 Thank you Mark for the incredible work you've been doing on this.
 I am following this very closely, because it is of primary importance
 for Electrum.
 
 I have written a Python-levelDB implementation of this UTXO hashtree,
 which is currently being tested, and will be added to Electrum servers.

Along the lines of my recent post on blockchain data:

Is it going to be possible to do partial prefix queries on that tree?

Also have you considered creating per-block indexes of all
scriptPubKeys, spent or unspent, queryable via the same partial prefix
method?

 I too believe that BIPs should define interoperability points, but probably
 not implementation details. For the UTXO hashtree, this means that a BIP
 should at least specify how the root hash is constructed. This might be the
 only thing that needs to be specified.
 
 However, I see no pressing issue with writing a BIP; it might be preferable
 to implement and test different options first, and learn from that.

It'd be very good to test this stuff thoroughly on Electrum first and
get a feel for the performance and usability before any soft-fork to
make it a miner commitment.

Similarly a C++ implementation should be simply added to Bitcoin Core as
a bloom filter replacement and made available over the P2P network.

-- 
'peter'[:-1]@petertodd.org
09bc28e08b41a74801c5878bf87978c2486aee7ed8a85778


signature.asc
Description: Digital signature
--
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET,  PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831iu=/4140/ostg.clktrk___
Bitcoin-development mailing list
Bitcoin-development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-development


Re: [Bitcoin-development] BIP proposal: Authenticated prefix trees

2014-01-06 Thread Mark Friedenbach
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 01/06/2014 10:13 AM, Peter Todd wrote:
 On Sun, Jan 05, 2014 at 07:43:58PM +0100, Thomas Voegtlin wrote:
 I have written a Python-levelDB implementation of this UTXO
 hashtree, which is currently being tested, and will be added to
 Electrum servers.
 
 Along the lines of my recent post on blockchain data:
 
 Is it going to be possible to do partial prefix queries on that
 tree?

There's really two tree structures being talked about here. Correct me
if I'm wrong Thomas, but last time I looked at your code it was still
implementing a 256-way PATRICIA trie, correct? This structure lends
itself to indexing either scriptPubKey or H(scriptPubKey) with
approximately the same performance characteristics, and in the
Ultimate blockchain compression thread there is much debate about
which to use.

In the process of experimentation I've since moved from a 256-way
PATRICIA trie to a bitwise, non-level-compressed trie structure - what
I call proof-updatable trees in the BIP. These have the advantage of
allowing stateless application of one proof to another, and as
consequence enable mining  mempool operations without access to the
UTXO set, so long as proofs are initially provided in the transaction
 block wire format.

The disadvantage is that performance is closely tied to key length,
making H(scriptPubKey) the much more desirable option. I'm sure you
see that as an advantage, however :)

 Also have you considered creating per-block indexes of all 
 scriptPubKeys, spent or unspent, queryable via the same partial
 prefix method?

This would be quite easy to do, separate from the UTXO structure but
using the same trie format.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJSy0iFAAoJEAdzVfsmodw434MQAIA/fDYT7SfMtfLEgDQKhXCn
slRqFEx/HXjvgHHSYnbr9V+8LrGzNvT2ImebbV9ge8VlziAFNGIUq2EYhFs4kHWu
GVm9aL8Jj/27SvM0tRwr9n2XIifKOh2sVINAjbv+UwPv/O+cULU95/b53DEF6aqI
OWxioOR50TPe4t9AevAGVypNLm1DsyDdymhO9xyBN92xGTNj5QKL5hHG3kcsLIl1
7KaxO0w4UC2sdSGj9FeyH1b0zYg8FlzjJHc1CUshHwUwyYo8LRJtRypL5lrayERg
Er/kIGEDovcenNBW8G79l+8VKPfB/lMTssT2pDiQL+1e1fg46CIQxHSyap2JSFTE
jgleRk/+1NK/ZjOQ8dEBPZK3TE1WY3qlm/ekjG/8W5kXqcxzFBoAkeBNXuJ/8UMi
mKe+DTmbp0xnvLO1p+hpugXKfrQSpcFL+ZvJHlFS1lz7O1N3WvuDCNP9El+L6ueM
nFzjr1NTnX0z4vYtscI7qBKVqUrB7Z84c3O/lSYpw4Jilxl4trzV4cn7+AF7KWGM
ktR9JJeIoNcJ2Zx4EpRp6OSwhtLkWZyLpPnidQ2p6ev2ytXpTpGsW/i5XS2w57UD
2IG5E0Q7Xzvd58lI/YollWQcagVOZdyzYXa+wVZoFQ6gLF47andpUmtUJOhI7gxv
T/rWhPhkTMUn8TdvUcV/
=N9zM
-END PGP SIGNATURE-

--
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET,  PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831iu=/4140/ostg.clktrk
___
Bitcoin-development mailing list
Bitcoin-development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-development


Re: [Bitcoin-development] BIP proposal: Authenticated prefix trees

2014-01-06 Thread Thomas Voegtlin

Le 07/01/2014 01:21, Mark Friedenbach a écrit :
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 On 01/06/2014 10:13 AM, Peter Todd wrote:
 On Sun, Jan 05, 2014 at 07:43:58PM +0100, Thomas Voegtlin wrote:
 I have written a Python-levelDB implementation of this UTXO
 hashtree, which is currently being tested, and will be added to
 Electrum servers.
 Along the lines of my recent post on blockchain data:

 Is it going to be possible to do partial prefix queries on that
 tree?
 There's really two tree structures being talked about here. Correct me
 if I'm wrong Thomas, but last time I looked at your code it was still
 implementing a 256-way PATRICIA trie, correct? This structure lends
 itself to indexing either scriptPubKey or H(scriptPubKey) with
 approximately the same performance characteristics, and in the
 Ultimate blockchain compression thread there is much debate about
 which to use.

You are right. The 256-way branching follows from the fact that
the tree was implemented using a key-value database operating
with byte strings (leveldb). With this implementation constraint,
a different branching would probably be possible but wasteful.

My recent code creates one leaf per unspent, and uses 56-byte
keys built as:

   H(scriptPubKey) + txid + txpos

(This is not pushed yet, it needs cleanup. Previous code created one 
leaf per address)

Partial prefix queries are possible with database iterators.

 In the process of experimentation I've since moved from a 256-way
 PATRICIA trie to a bitwise, non-level-compressed trie structure - what
 I call proof-updatable trees in the BIP. These have the advantage of
 allowing stateless application of one proof to another, and as
 consequence enable mining  mempool operations without access to the
 UTXO set, so long as proofs are initially provided in the transaction
  block wire format.

I see the advantage of doing that, but this looks really far-fetched..
My understanding is that it would require a complete change in the
way clients and miners work. Could such a change be brought iteratively?



--
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET,  PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831iu=/4140/ostg.clktrk
___
Bitcoin-development mailing list
Bitcoin-development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-development


Re: [Bitcoin-development] BIP proposal: Authenticated prefix trees

2014-01-05 Thread Thomas Voegtlin
Hello and happy new year to this mailing list!


Thank you Mark for the incredible work you've been doing on this.
I am following this very closely, because it is of primary importance
for Electrum.

I have written a Python-levelDB implementation of this UTXO hashtree,
which is currently being tested, and will be added to Electrum servers.

My implementation follows Alan Reiner's idea to store the tree as items
in a key-value database. I believe that a C++ implementation like yours
will be at least an order of magnitude faster, and I am looking forward 
to it.

I too believe that BIPs should define interoperability points, but probably
not implementation details. For the UTXO hashtree, this means that a BIP
should at least specify how the root hash is constructed. This might be the
only thing that needs to be specified.

However, I see no pressing issue with writing a BIP; it might be preferable
to implement and test different options first, and learn from that.

Thomas



Le 20/12/2013 02:47, Mark Friedenbach a écrit :
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Hello fellow bitcoin developers. Included below is the first draft of
 a BIP for a new Merkle-compressed data structure. The need for this
 data structure arose out of the misnamed Ultimate blockchain
 compression project, but it has since been recognized to have many
 other applications.

 In addition to this BIP I am preparing three additional BIPs
 describing the use of this data structure in stateless validation 
 mining, the UBC address index for SPV+ operating modes, document
 timestamping and merged mining.

 A Python implementation of this data structure is available here:

 https://github.com/monetizeio/python-bitcoin

 A C++ implementation is being worked on.

 As per the BIP-1 procedure, I am submitting this rough draft to the
 community for discussion. I welcome all comments and criticisms of
 both form and content.

 - -Mark


 ==Abstract==

 This BIP describes a [http://en.wikipedia.org/wiki/Hash_tree Merkle
 hash tree] variant of the [http://en.wikipedia.org/wiki/Trie
 prefix-tree data structure], ideally suited for encoding key-value
 indices which support memory-efficient proofs.

 ==Motivation==

 There are a number of applications which would benefit from having a
 data structure with the following properties:

 * '''Arbitrary mapping of keys to values.''' A ''key'' can be any
 bytestring, and its ''value'' any other bytestring.
 * '''Duplicate keys disallowed.''' Every key has one, and only one
 value associated with it. Some applications demand assurance that no
 key value is reused, and that this constraint can be checked without
 requiring access to the entire data structure.
 * '''Efficient look-up by key.''' The data structure should support
 sub-linear lookup operations with respect to the number of keys in the
 mapping. Logarithmic time or linear with respect to the length of the
 key should be achievable and would be sufficient for realistic
 applications.
 * '''Merkle compression of mapping structure.''' It should be possible
 to produce a reduced description of the tree consisting of a single
 root hash value which is deterministically calculated from the mapping
 structure.
 * '''Efficient proofs of inclusion.''' It should be possible to
 extract a proof of key/value mapping which is limited in size and
 verification time by the length of the key in the worst case.
 * '''Computation of updates using local information.''' Given a set of
 inclusion proofs, it should be possible to calculate adjustments to
 the local mapping structure (update or deletion of included mappings,
 or insertion between two included mappings which are adjacent in the
 global structure).

 Such applications include committed validation indices which enable
 stateless mining nodes, committed wallet indices which enable
 trust-less querying of the unspent transaction output set by
 codescriptPubKey/code, efficient document time-stamping, and
 secure  efficient merged mining. This BIP describes an authenticated
 prefix tree which has the above properties, but leaves the myriad
 applications to be formalized in future BIPs.

 ==Data structure==

 This BIP defines a binary prefix tree. Such a structure provides a
 mapping of bitstrings (the ''keys'') to bytestrings (the ''values'').
 It is an acyclic binary tree which implicitly encodes keys within the
 traversal path -- a left branch is a 0, and a right branch is a 1.
 Each node is reachable by only one unique path, and reading off the
 branches taken (0 for each left, 1 for each right) as one follows the
 path from root to target yields the node's key.

 The particular binary prefix tree defined by this BIP is a hybrid
 PATRICIA / de la Brandais tree structure.
 [http://en.wikipedia.org/wiki/Radix_tree PATRICIA trees] compress a
 long sequence of non-branching nodes into a single interior node with
 a per-branch ''skip prefix''. This achieves significant savings in
 storage space, root 

Re: [Bitcoin-development] BIP proposal: Authenticated prefix trees

2013-12-20 Thread Mark Friedenbach
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi Jeremy, Let's give a preview of the application-oriented BIPs I
mentioned:

Stateless validation and mining involves prefixing transaction and
block messages with proofs of their UTxO state changes. These are the
operational proofs I describe in the draft, and they work on prefix
trees whose root hashes committed to the coinbase in a soft-fork
upgrade of the validation rules.

Ultimate blockchain compression involves consensus over an address
index, which can be queried over the p2p network by lightweight nodes.
The structure of the index is an authenticated prefix tree, and the
results of such a query is an an inclusion proof.

Document time-stamping and this new method of merged mining use the
same structure: a prefix tree whose root hash value is committed to a
pruneable output of the coinbase transaction. Document timestamp
proofs and merged mining proof-of-works are inclusion proofs over this
tree.

I hope that shows how the BIP directly affects interoperability of the
bitcoin protocol and clients which use these applications. I released
this BIP first to get some feedback on the structure itself, which
will be used by all of the application-specific BIPs which follow.


Stepping back and speaking generically, the purpose of a BIP as I see
it is to standardize details which affect interoperability between
clients. In fact, at a cursory glance only about half of the BIPs deal
with protocol issues directly - the rest deal with local /
user-interface issues like key derivation or JSON-RPC APIs. Even if
none of the applications involved protocol changes, I still think BIPs
like this would be of value in that they serve to standardize things
which are or will seek to become commonly used and widely implemented.

Cheers,
Mark

On 12/19/2013 10:48 PM, Jeremy Spilman wrote:
 Wow there's a lot here to think about. I'm pretty sure I haven't
 grasped the full implications yet.
 
 I see it proposes to also introduce additional BIPs describing the
 use of the data stucture for stateless validation  mining, the UBC
 address index for SPV+ operating modes, document timestamping and
 merged mining.
 
 Can the BIP stand alone as a BIP without some specific changes to
 the protocol or end-user accessible features defined within it? It
 seems like an extremely useful data stucture, but as I understand
 it the purpose of BIPS is defining interoperability points, not
 implementation details?
 
 Unless the tree itself is becoming part of the protocol, seems like
 its spec, test vectors, and reference implementation can live
 elsewhere, but I would love to read about BIPS which use this tree
 to accomplish some amazing scalability or security benefits.
 
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJStChCAAoJEAdzVfsmodw42DcQAIlgkukh5K/XYloIiT5pgaHS
xCZXtOvxpNUep8x35rvEO1ePjvPvUkbUE2jRw2se1rSMkhzw3PpHHtXV/gIOGqUe
WVKeeIM5pZX56sEcEdUQ1pTwB2rmtSNeyCuHl8fLatk8eLhcAHcpv/7esLuAjWCr
EE840s8+q3ltuzKi3nqxK84bwIohgSMKhncfonNp5lMAtug8Itqopq3DPDoxwiK/
qUwSz5UCEMH6oNHnywzhKGUhBErqo4q8IgAKcZYBZZ9n4BRjf4ngoCw9n5wCef8v
tyTvwrg0nSQTQa67cg7RCsY7SisrI9gaMvCYTSvEMKdw9X0aqAX1p0yZpTbV+dIr
Q2ZT6gJmg2sD22zKY1/58oq+PiNO+nRS81OG2znZofsIfhOVW0bIZAQ8+zZtFW40
vXxMuHCNieCK8e7f9A6LLv/Zz7rmNxdQ6cHBEL1nIs1Y4d1FpHJVI2LHi54QmVXf
C5PKF/e7K2eD3LZMNxS818rZaiJJ7qmpjS3rkG2owHyJHEhBJIlkYXfI1YCraT+b
R5AzAh2Oz0Nyb5ChP2VSaecJNjGvRMo7Z6HCytmgAGOEcDDZkxSv0kkprbvchqXx
XziFgA4iSajBKYWPiPLGMADfMPT6zd4fhDjyaN8+LPO3d3ZK1VwmQDLRQ3DxfeIP
RgchHR/pS77XI7hCFwtN
=ao17
-END PGP SIGNATURE-

--
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET,  PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831iu=/4140/ostg.clktrk
___
Bitcoin-development mailing list
Bitcoin-development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-development


Re: [Bitcoin-development] BIP proposal: Authenticated prefix trees

2013-12-20 Thread Peter Todd
On Fri, Dec 20, 2013 at 03:10:50AM -0800, Mark Friedenbach wrote:
 On 12/20/2013 02:48 AM, Peter Todd wrote:
  On Thu, Dec 19, 2013 at 05:47:52PM -0800, Mark Friedenbach wrote:
  This BIP describes the authenticated prefix tree and its many 
  variations in terms of its serialized representation. Additional
  BIPs describe the application of authenticated prefix trees to
  such applications as committed indices, document time-stamping,
  and merged mining.
  
  Could you expand more on how prefix trees could be used for 
  time-stamping and merged mining?
 
 The root hash of a prefix tree is placed in the coinbase at a location
 standardized by convention.

Right, last txout in an OP_RETURN like we discussed.

 For document time-stamping, the key can be
 the hash of the document.

Don't you mean the value is the hash of the document and the key is
irrelevant?

 For merged mining, the key is the hash of
 the genesis block of the altchain, and the value is the hash of the
 aux-pow (for p2pool, the share hash).

What's the advantage over the direction-based system I proposed before?
Seems to me the code required to validate the proof is significantly
more complex in your scheme.

http://www.mail-archive.com/bitcoin-development@lists.sourceforge.net/msg03149.html

 In the system I have in mind this adds 43 bytes to the coinbase
 transaction,

By 43 bytes you mean the whole op_return txout right?

  dict = AuthTree() dict['Curie'] = VARINT(1898) 
  dict('Einstein') = VARINT(1905) dict['Fleming'] =
  VARINT(1928) dict['中本'] = VARINT(2009)
  
  I'd be inclined to leave the unicode out of the code examples as
  many editors and shells still don't copy-and-paste it nicely. Using
  it in BIP documents themselves is fine and often has advantages re:
  typesetting, but using it in crypto examples like this just makes
  it harder to reproduce the results by hand unnecessarily.
 
 Thanks for the feedback, I rather agree. When I was creating that
 example for some reason I wanted the right branch of the root node to
 be used, which is difficult when only 7-bit ASCII keys are used. But I
 don't think the illustrative point I had in mind ended up being
 particularly relevant, so I'll rework it.

That example is python, so I'd suggest just using escape sequences
myself. You probably also should include the b prefix to make the
strings explicitly binary for py3 compatibility, ie dict[b'\xbe\xef']

-- 
'peter'[:-1]@petertodd.org
000216e3750a9ad9584395352d728a3c543844eab3bfc9ce1073


signature.asc
Description: Digital signature
--
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET,  PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831iu=/4140/ostg.clktrk___
Bitcoin-development mailing list
Bitcoin-development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-development


Re: [Bitcoin-development] BIP proposal: Authenticated prefix trees

2013-12-20 Thread Mark Friedenbach
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

(Sorry Peter, this was meant for the whole list:)

On 12/20/2013 05:17 AM, Peter Todd wrote:
 I've thought about this for awhile and come to the conclusion that 
 UTXO commitments are a really bad idea. I myself wanted to see them
 implemented about a year ago for fidelity bonded banks, but I've
 changed my mind and I hope you do too.
 
 They force miners and every full node with SPV clients to store the
 entire UTXO set in perpetuity.

This is incorrect. If the slower proof-updatable hashes are used, then
mining only requires what I've called operational proofs to be
attached to received transactions and blocks.

Access to the UTXO set is required to make new transactions, at least
for the outputs of the transaction, but I do not believe this is as
significant a problem as you do. It is a service that can be
outsourced for a minimal fee - include an explicit output of the
necessary amount to a scriptPubKey specified by the archival node, and
they will make sure the proper proofs are attached.

 This is bad by itself, but then they make it even worse by making 
 Bitcoin really useful and convenient to use as a decentralized 
 database; UTXO commitments make it easy and convenient to
 implement systems like Namecoin on top of Bitcoin, yet we don't
 have the UTXO expiration that might make such uses reasonable.
 Right now the UTXO set is reasonable small - ~300MB - but that can
 and will change if we make it an attractive way to store data.
 UTXO commitments do exactly that.

You might have to explain this to me, but it is not clear to me how
the validation index could be twisted into providing a Namecoin-like
system. Or the address index either, which I presume is what you are
referring to. Namecoin works by assigning domains to outputs, and then
tracking ownership and configuration of that domain through chains of
outputs. But the UTXO set doesn't contain connecting information. At
best all it would be is a glorified, and expensive time-stamper,
unattractive because there are already better solutions.

 You're also *not* giving users what they actually want: the 
 transactions associated with their wallets. Even though Electrum 
 could easily work via a pure UTXO database they implemented 
 transaction lookup instead; Electrum servers cough up every 
 transaction associated with a user's wallet. If you're going to do 
 that, it's just as easy to do per-block lookup trees which don't 
 force the UTXO set to be stored.

At the cost of having the supposedly lightweight client query for each
of its coins on every single block, to construct a negative
proof-of-spend.

 There's also a more subtle issue: the security model of UTXO 
 commitments sucks. It encourages wallets to essentially trust 
 single confirmations because it's unlikely that nodes will want to 
 store the multiple copies of the UTXO set required to provide
 proof of multiple confirmations. Basically the issue is when you
 start your wallet you get a proof of UTXO set for the most recent
 block; that's just one confirmation. To get more confirmations you
 have to wait for subsequent blocks, checking the set on each block.
 Per block indexes on the other hand naturally lead wallets to
 count confirmations properly.

I don't think this is true, or at least you are not considering
available optimizations. You certainly don't need to store multiple
copies of the UTXO set.

I'm a little confused as to the exact situation you are describing.
When a key is loaded into a wallet, or a wallet comes online after a
significant absence, it looks for coins in the current UTXO set. If
any coins are found, their attached transaction record has a block
height field, so the confirmation count can be derived from that. As
blocks go by that count is naturally increased. I'm not sure how this
is different from the current situation.

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJStI9aAAoJEAdzVfsmodw4IooP/1uK9cvP1vxXyQRbAHf9oFXw
AmZ8p1+t8f6MHUpjkv/Xn0poFNU8qSnNz65drQdq8ErcJnqe4V3Wt6G32/uCxvZs
6AX6bRYQIfhHY0DBPgfacO5/ALdlnS4NdjWFCA5hHDgLd30BpbU1WK1ze985TXrd
+ucQkzcMYEDW2lb+sFvfhpi9ZPFd34ZrNzH//oS794eYKWAmj7jXqdgxk5AKat61
Xileq5beE4xom3pChXc3PtIJKsoil5SjE20/FW52wcCdyaEFG0kwl937pEGjQnlP
mylK/ilfZ6cvRC8MmVnl/6AC4V2hjB4Ncej03jG3JI2FdaJEOHuHg0uh8/Zl1I4A
YVIKyrHQhQb/VGsfXtW3zokHzDeEtJwlx+PPFaLc9QurFirNjSnenhbw4Vpbg3Xt
dH1Qd9xWcT85a19Oz8Q4rt3z7UmX9J/geZrUHCuPtr47yXU0e1Cc6ZP9zDYNtfKU
q6MjNZiaLJ/Wp0n4IeQ/4/wqy0rM/psP9i5d6IdP96tayVM9aKj5Lh9lU/Od5wGO
2PPB4kvhJfMbx3o+S7UK5vra7ysZzULpoVGDpUR3xRM72l//vlNhSLK5nILVO3r8
sIC5+3WoZLUKvuNo45/BDxXHZajrWLCU84WrwHVm1u7SHfBQcoES/rhcx2zlgyx0
/Iwxsgb7Fznl+eM2bEpZ
=TtaV
-END PGP SIGNATURE-

--
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations 

Re: [Bitcoin-development] BIP proposal: Authenticated prefix trees

2013-12-20 Thread Gregory Maxwell
On Thu, Dec 19, 2013 at 5:47 PM, Mark Friedenbach m...@monetize.io wrote:
 Hello fellow bitcoin developers. Included below is the first draft of
 a BIP for a new Merkle-compressed data structure. The need for this
 data structure arose out of the misnamed Ultimate blockchain
 compression project, but it has since been recognized to have many
 other applications.

A couple very early comments— I shared some of these with you on IRC
but I thought I'd post them to make them more likely to not get lost.

Whats a VARCHAR()  A zero terminated string?  A length prefixed
string? How is the length encoded?  Hopefully not in a way that has
redundancy, since things that don't survive a serialization round trip
is a major trap.

Is the 'middle' the best place for the extradata? Have you
contemplated the possibility that some applications might use midstate
compression?

On that general subject, since the structure here pretty much always
guarantees two compression function invocations. SHA512/256 might
actually be faster in this application.

Re: using sha256 instead of sha256^2, we need to think carefully about
the implications of Merkle-Damgard generic length extension attacks.
It would be unfortunately to introduce them here, even though they're
currently mostly theoretical for sha256.

WRT hash function performance, hash functions are so ludicrously fast
(and will be more so as processors get SHA2 instructions) that the
performance of the raw compression function would hardly ever be a
performance consideration unless you're using a slow interpreted
language (... and that sounds like a personal problem to me). So I
don't think CPU performance should be a major consideration in this
BIP.

What I do think should be a consideration is the cost of validating
the structure under a zero-knowledge proof. An example application is
a blind proof for a SIN or a proof of how much coin you control... or
even a proof that a block was a correctly validated one, and in these
cases additional compression function calls are indeed pretty
expensive. But they're not the only cost, any conditional logic in the
hash tree evaluation is expensive, and particular, I think that any
place where data from children will be combined with a variable offset
(especially if its not word aligned) would potentially be rather
expensive.

I'm unconvinced about the prefix tree compressed applications, since
they break compact update proofs.  If we used them in the Bitcoin
network they could only be used for data where all verifying nodes had
all their data under the tree. I think they add a lot of complexity to
the BIP (esp from people reading the wrong section), so perhaps they
should be split into another document?

In any case, I want to thank you for talking the time to write this
up. You've been working on this stuff for a while and I think it will
be lead to useful results, even if we don't end up using how it was
originally envisioned.

--
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET,  PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831iu=/4140/ostg.clktrk
___
Bitcoin-development mailing list
Bitcoin-development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-development


Re: [Bitcoin-development] BIP proposal: Authenticated prefix trees

2013-12-20 Thread Mark Friedenbach
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 12/20/2013 11:48 AM, Gregory Maxwell wrote:
 A couple very early comments— I shared some of these with you on
 IRC but I thought I'd post them to make them more likely to not get
 lost.

I got the inputs from IRC, but thank you for posting to the list so
that others can see and review.

 Whats a VARCHAR()  A zero terminated string?  A length prefixed 
 string? How is the length encoded?  Hopefully not in a way that
 has redundancy, since things that don't survive a serialization
 round trip is a major trap.

A length-prefixed string, using the shortest representation VARINT for
the length. Same as how scripts are serialized in transactions.

 Is the 'middle' the best place for the extradata? Have you 
 contemplated the possibility that some applications might use
 midstate compression?

Yes I considered midstate compression which is why the branch hashes
come last, but extra was an oversight. In every application I've
considered it's either not used (and therefore a single byte), or
updated whenever the node or its children updates.

Honestly I don't expect midestate compression to offer much since in
the nodes that are updated frequently it is unlikely that there will
be enough static data at the front to fill even a 512 bit block of the
smaller hash function.

But it doesn't hurt to prepare just in case. I'll move it to the end.

 On that general subject, since the structure here pretty much
 always guarantees two compression function invocations. SHA512/256
 might actually be faster in this application.

Yes, this is a great suggestion. Moving to SHA-512/256 will let most
inner nodes fit inside a single block, so long as the extra field is
not too long. Also apparently SHA-512 is faster on 64-bit CPUs, which
is a nice advantage. I didn't know that.

I'm concerned about speed but I did not go with a faster hash function
because users are more likely to have hardware acceleration for the
SHA-2 family.

 Re: using sha256 instead of sha256^2, we need to think carefully
 about the implications of Merkle-Damgard generic length extension
 attacks. It would be unfortunately to introduce them here, even
 though they're currently mostly theoretical for sha256.

The serialization format encodes lengths in such a way that you cannot
extend the data structure merely by appending bits. You would have to
change the prior, already hashed bits as well. I believe this makes it
immune to length extension attacks.

 WRT hash function performance, hash functions are so ludicrously
 fast (and will be more so as processors get SHA2 instructions) that
 the performance of the raw compression function would hardly ever
 be a performance consideration unless you're using a slow
 interpreted language (... and that sounds like a personal problem
 to me). So I don't think CPU performance should be a major
 consideration in this BIP.

Well.. the UTXO tree is big. Let's assume 5,000 transactions per
block, with an average of 3 inputs/outputs per transaction. This is
close to the worst-case scenario with the current block size. That's
15,000 insert, update, or delete operations.

The number of hashes required when level-compression is used is log2
the number of items in the tree, which for bitcoin is currently about
2.5 million transactions. So that's about ~21 hashes per input/ouput,
or 315,000 hash operations. A CPU is able to do about 100,000 hashes
per second per core, that'll probably take about a second on a modern
4- or 8-core machine.

For updatable proofs, the number of hash operations is equal to the
number of bits in the key, which for the validation index is always
256. That means 3.84 million hashes, or about 10 seconds on a 4-core
machine.

The numbers for the wallet index are worse, as it scales with the
number of outputs, which is necessarily larger, and the keys are longer.

This is not an insignificant cost in the near term, although it is the
type of operation that could be easily offloaded to a GPU or FPGA.

 What I do think should be a consideration is the cost of
 validating the structure under a zero-knowledge proof. An example
 application is a blind proof for a SIN or a proof of how much coin
 you control... or even a proof that a block was a correctly
 validated one, and in these cases additional compression function
 calls are indeed pretty expensive. But they're not the only cost,
 any conditional logic in the hash tree evaluation is expensive, and
 particular, I think that any place where data from children will be
 combined with a variable offset (especially if its not word
 aligned) would potentially be rather expensive.

This is something I know less about, and I welcome constructive input.
There is *no* reason that the hash serialization needs to have fancy
space-saving features. You could even make the SIG_HASH node
serialization into fixed-size, word-aligned data structures.

But this is absolutely not my field, and I may need some hand-holding.
Do 

[Bitcoin-development] BIP proposal: Authenticated prefix trees

2013-12-19 Thread Mark Friedenbach
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hello fellow bitcoin developers. Included below is the first draft of
a BIP for a new Merkle-compressed data structure. The need for this
data structure arose out of the misnamed Ultimate blockchain
compression project, but it has since been recognized to have many
other applications.

In addition to this BIP I am preparing three additional BIPs
describing the use of this data structure in stateless validation 
mining, the UBC address index for SPV+ operating modes, document
timestamping and merged mining.

A Python implementation of this data structure is available here:

https://github.com/monetizeio/python-bitcoin

A C++ implementation is being worked on.

As per the BIP-1 procedure, I am submitting this rough draft to the
community for discussion. I welcome all comments and criticisms of
both form and content.

- -Mark


==Abstract==

This BIP describes a [http://en.wikipedia.org/wiki/Hash_tree Merkle
hash tree] variant of the [http://en.wikipedia.org/wiki/Trie
prefix-tree data structure], ideally suited for encoding key-value
indices which support memory-efficient proofs.

==Motivation==

There are a number of applications which would benefit from having a
data structure with the following properties:

* '''Arbitrary mapping of keys to values.''' A ''key'' can be any
bytestring, and its ''value'' any other bytestring.
* '''Duplicate keys disallowed.''' Every key has one, and only one
value associated with it. Some applications demand assurance that no
key value is reused, and that this constraint can be checked without
requiring access to the entire data structure.
* '''Efficient look-up by key.''' The data structure should support
sub-linear lookup operations with respect to the number of keys in the
mapping. Logarithmic time or linear with respect to the length of the
key should be achievable and would be sufficient for realistic
applications.
* '''Merkle compression of mapping structure.''' It should be possible
to produce a reduced description of the tree consisting of a single
root hash value which is deterministically calculated from the mapping
structure.
* '''Efficient proofs of inclusion.''' It should be possible to
extract a proof of key/value mapping which is limited in size and
verification time by the length of the key in the worst case.
* '''Computation of updates using local information.''' Given a set of
inclusion proofs, it should be possible to calculate adjustments to
the local mapping structure (update or deletion of included mappings,
or insertion between two included mappings which are adjacent in the
global structure).

Such applications include committed validation indices which enable
stateless mining nodes, committed wallet indices which enable
trust-less querying of the unspent transaction output set by
codescriptPubKey/code, efficient document time-stamping, and
secure  efficient merged mining. This BIP describes an authenticated
prefix tree which has the above properties, but leaves the myriad
applications to be formalized in future BIPs.

==Data structure==

This BIP defines a binary prefix tree. Such a structure provides a
mapping of bitstrings (the ''keys'') to bytestrings (the ''values'').
It is an acyclic binary tree which implicitly encodes keys within the
traversal path -- a left branch is a 0, and a right branch is a 1.
Each node is reachable by only one unique path, and reading off the
branches taken (0 for each left, 1 for each right) as one follows the
path from root to target yields the node's key.

The particular binary prefix tree defined by this BIP is a hybrid
PATRICIA / de la Brandais tree structure.
[http://en.wikipedia.org/wiki/Radix_tree PATRICIA trees] compress a
long sequence of non-branching nodes into a single interior node with
a per-branch ''skip prefix''. This achieves significant savings in
storage space, root hash calculation, and traversal time.

A de la Brandais trie achieves compression by only storing branches
actually taken in a node. The space savings are minimal for a binary
tree, but place the serialized size of a non-branching interior node
under the SHA-256 block size, thereby reducing the number of hash
operations required to perform updates and validate proofs.

This BIP describes the authenticated prefix tree and its many
variations in terms of its serialized representation. Additional BIPs
describe the application of authenticated prefix trees to such
applications as committed indices, document time-stamping, and merged
mining.

==Serialization format==

As a hierarchical structure, the serialization of an entire tree is
the serialization of its root node. A serialized node is the
concatenation of five structures:

node := flags || VARCHAR(extra) || value || left || right

The codeflags/code is a single byte field whose composite values
determine the bytes that follow.

flags = (left_flags   0) |
(right_flags  2) |
(has_value4) |