[Bitcoin-development] Service bits for pruned nodes

2013-04-28 Thread Pieter Wuille
Hello all,

I think it is time to move forward with pruning nodes, i.e. nodes that
fully validate and relay blocks and transactions, but which do not keep
(all) historic blocks around, and thus cannot be queried for these.

The biggest roadblock is making sure new and old nodes that start up are
able to find nodes to synchronize from. To help them find peers, I would
like to propose adding two extra service bits to the P2P protocol:
* NODE_VALIDATE: relay and validate blocks and transactions, but is only
guaranteed to answer getdata requests for (recently) relayed blocks and
transactions, and mempool transactions.
* NODE_BLOCKS_2016: can be queried for the last 2016 blocks, but without
guarantee for relaying/validating new blocks and transactions.
* NODE_NETWORK (which existed before) will imply NODE_VALIDATE and
guarantee availability of all historic blocks.

The idea is to separate the different responsibilities of network nodes
into separate bits, so they can - at some point - be
implemented independently. Perhaps we want more than just one degree (2016
blocks), maybe also 144 or 21, but those can be added later if
necessary. I monitored the frequency of block depths requested from my
public node, and got this frequency distribution:
http://bitcoin.sipa.be/depth-small.png so it seems 2016 nicely matches the
set of frequently-requested blocks (indicating that few nodes are offline
for more than 2 weeks consecutively.

I'll write a BIP to formalize this, but wanted to get an idea of how much
support there is for a change like this.

Cheers,

-- 
Pieter
--
Try New Relic Now  We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app,  servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr___
Bitcoin-development mailing list
Bitcoin-development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-development


Re: [Bitcoin-development] Service bits for pruned nodes

2013-04-28 Thread Mike Hearn
I'd imagined that nodes would be able to pick their own ranges to keep
rather than have fixed chosen intervals. Everything or two weeks is
rather restrictive - presumably node operators are constrained by physical
disk space, which means the quantity of blocks they would want to keep can
vary with sizes of blocks, cost of storage, etc.

Adding new fields to the addr message and relaying those fields to newer
nodes means every node could advertise the height at which it pruned. I
know it means a longer time before the data is available everywhere vs
service bits, but it seems like most nodes won't be pruning right away
anyway. There's plenty of time for upgrades. If an old node connected to a
new node and getdata-d blocks that had been pruned, immediate disconnection
should make the old node go find a different one. It means the combination
of old node+not run for a long time might take a while before it can find a
node that has what it wants, but that doesn't seem like a big deal.

What is the use case for NODE_VALIDATE? Nodes that throw away blocks almost
immediately? Why would a node do that?


On Sun, Apr 28, 2013 at 5:51 PM, Pieter Wuille pieter.wui...@gmail.comwrote:

 Hello all,

 I think it is time to move forward with pruning nodes, i.e. nodes that
 fully validate and relay blocks and transactions, but which do not keep
 (all) historic blocks around, and thus cannot be queried for these.

 The biggest roadblock is making sure new and old nodes that start up are
 able to find nodes to synchronize from. To help them find peers, I would
 like to propose adding two extra service bits to the P2P protocol:
 * NODE_VALIDATE: relay and validate blocks and transactions, but is only
 guaranteed to answer getdata requests for (recently) relayed blocks and
 transactions, and mempool transactions.
 * NODE_BLOCKS_2016: can be queried for the last 2016 blocks, but without
 guarantee for relaying/validating new blocks and transactions.
 * NODE_NETWORK (which existed before) will imply NODE_VALIDATE and
 guarantee availability of all historic blocks.

 The idea is to separate the different responsibilities of network nodes
 into separate bits, so they can - at some point - be
 implemented independently. Perhaps we want more than just one degree (2016
 blocks), maybe also 144 or 21, but those can be added later if
 necessary. I monitored the frequency of block depths requested from my
 public node, and got this frequency distribution:
 http://bitcoin.sipa.be/depth-small.png so it seems 2016 nicely matches
 the set of frequently-requested blocks (indicating that few nodes are
 offline for more than 2 weeks consecutively.

 I'll write a BIP to formalize this, but wanted to get an idea of how much
 support there is for a change like this.

 Cheers,

 --
 Pieter





 --
 Try New Relic Now  We'll Send You this Cool Shirt
 New Relic is the only SaaS-based application performance monitoring service
 that delivers powerful full stack analytics. Optimize and monitor your
 browser, app,  servers with just a few lines of code. Try New Relic
 and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr
 ___
 Bitcoin-development mailing list
 Bitcoin-development@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/bitcoin-development


--
Try New Relic Now  We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app,  servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr___
Bitcoin-development mailing list
Bitcoin-development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-development


Re: [Bitcoin-development] Service bits for pruned nodes

2013-04-28 Thread Pieter Wuille
On Sun, Apr 28, 2013 at 6:29 PM, Mike Hearn m...@plan99.net wrote:

 I'd imagined that nodes would be able to pick their own ranges to keep
 rather than have fixed chosen intervals. Everything or two weeks is
 rather restrictive - presumably node operators are constrained by physical
 disk space, which means the quantity of blocks they would want to keep can
 vary with sizes of blocks, cost of storage, etc.


Sure, that's why eventually several levels may be useful.

Adding new fields to the addr message and relaying those fields to newer
 nodes means every node could advertise the height at which it pruned. I
 know it means a longer time before the data is available everywhere vs
 service bits, but it seems like most nodes won't be pruning right away
 anyway. There's plenty of time for upgrades.


That's a more flexible model, indeed. I'm not sure how important speed of
propagation will be though - it may be very slow, given that there are
10s of IPs circulating, and only a few are relayed in one go between
nodes. Even then, I'd like to see the relay/validation responsibility
split off from the serve historic data one, and have separate service
bits for those.


 If an old node connected to a new node and getdata-d blocks that had been
 pruned, immediate disconnection should make the old node go find a
 different one. It means the combination of old node+not run for a long time
 might take a while before it can find a node that has what it wants, but
 that doesn't seem like a big deal.


Disconnecting in case something is requested that isn't served seems like
an acceptable behaviour, yes. A specific message indicating data is pruned
may be more flexible, but more complex to handle too.

What is the use case for NODE_VALIDATE? Nodes that throw away blocks almost
 immediately? Why would a node do that?


NODE_VALIDATE doesn't say anything about which blocks are available, it
just means it relays and validates (and thus is not an SPV node). It can be
combined with NODE_BLOCKS_2016 if those blocks are also served.

The reason for splitting them is that I think over time these may be
handled by different implementations. You could have stupid
storage/bandwidth nodes that just keep the blockchain around, and others
that validate it. Even if that doesn't happen implementation-wise, I think
these are sufficiently independent functions to start thinking about them
as such.

-- 
Pieter
--
Try New Relic Now  We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app,  servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr___
Bitcoin-development mailing list
Bitcoin-development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-development


Re: [Bitcoin-development] Service bits for pruned nodes

2013-04-28 Thread Gregory Maxwell
On Sun, Apr 28, 2013 at 9:29 AM, Mike Hearn m...@plan99.net wrote:
 I'd imagined that nodes would be able to pick their own ranges to keep
 rather than have fixed chosen intervals. Everything or two weeks is rather

X most recent is special for two reasons:  It meshes well with actual demand,
and the data is required for reorganization.

So whatever we do for historic data, N most recent should be treated
specially.

But I also agree that its important that everything be splittable into ranges
because otherwise when having to choose between serving historic data
and— say— 40 GB storage, a great many are going to choose not to serve
historic data... and so nodes may be willing to contribute 4-39 GB storage
to the network there will be no good way for them to do so and we may end
up with too few copies of the historic data available.

As can be seen in the graph, once you get past the most recent 4000
blocks the probability is fairly uniform... so N most recent is not a
good way to divide load for the older blocks. But simple ranges— perhaps
quantized to groups of 100 or 1000 blocks or something— would work fine.

This doesn't have to come in the first cut, however— and it needs new
addr messages in any case.

--
Try New Relic Now  We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app,  servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr
___
Bitcoin-development mailing list
Bitcoin-development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-development


Re: [Bitcoin-development] Service bits for pruned nodes

2013-04-28 Thread Gregory Maxwell
On Sun, Apr 28, 2013 at 7:57 PM, John Dillon
john.dillon...@googlemail.com wrote:
 Have we considered just leaving that problem to a different protocol such as
 BitTorrent? Offering up a few GB of storage capacity is a nice idea but it
 means we would soon have to add structure to the network to allow nodes to 
 find
 each other to actually get that data. BitTorrent already has that issue 
 thought
 through carefully with it's DHT support.

I think this is not a great idea on a couple levels—

Least importantly, our own experience with tracker-less torrents on
the bootstrap files that they don't work very well in practice— and
thats without someone trying to DOS attack it.

More importantly, I think it's very important that the process of
offering up more storage not take any more steps. The software could
have user overridable defaults based on free disk space to make
contributing painless. This isn't possible if it takes extra software,
requires opening additional ports.. etc.  Also means that someone
would have to be constantly creating new torrents, there would be
issues with people only seeding the old ones, etc.

It's also the case that bittorrent is blocked on many networks and is
confused with illicit copying. We would have the same problems with
that that we had with IRC being confused with botnets.

We already have to worry about nodes finding each other just for basic
operation. The only addition this requires is being able to advertise
what parts of the chain they have.

 What are the logistics of either integrating a DHT capable BitTorrent client,
 or just calling out to some library? We could still use the Bitcoin network to
 bootstrap the BitTorrent DHT.

Using Bitcoin to bootstrap the Bittorrent DHT would probably make it
more reliable, but then again it might cause commercial services that
are in the business of poisoning the bittorrent DHT to target the
Bitcoin network.

Integration also brings up the question of network exposed attack surface.

Seems like it would be more work than just adding the ability to add
ranges to address messages. I think we already want to revise the
address message format in order to have signed flags and to support
I2P peers.

--
Try New Relic Now  We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app,  servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr
___
Bitcoin-development mailing list
Bitcoin-development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-development


Re: [Bitcoin-development] Service bits for pruned nodes

2013-04-28 Thread Robert Backhaus
While I like the idea of a client using a DHT blockchain or UTXO list, I
don't think that the reference client is the place for it. But it would
make for a very interesting experimental project!


On 29 April 2013 13:36, Gregory Maxwell gmaxw...@gmail.com wrote:

 On Sun, Apr 28, 2013 at 7:57 PM, John Dillon
 john.dillon...@googlemail.com wrote:
  Have we considered just leaving that problem to a different protocol
 such as
  BitTorrent? Offering up a few GB of storage capacity is a nice idea but
 it
  means we would soon have to add structure to the network to allow nodes
 to find
  each other to actually get that data. BitTorrent already has that issue
 thought
  through carefully with it's DHT support.

 I think this is not a great idea on a couple levels—

 Least importantly, our own experience with tracker-less torrents on
 the bootstrap files that they don't work very well in practice— and
 thats without someone trying to DOS attack it.

 More importantly, I think it's very important that the process of
 offering up more storage not take any more steps. The software could
 have user overridable defaults based on free disk space to make
 contributing painless. This isn't possible if it takes extra software,
 requires opening additional ports.. etc.  Also means that someone
 would have to be constantly creating new torrents, there would be
 issues with people only seeding the old ones, etc.

 It's also the case that bittorrent is blocked on many networks and is
 confused with illicit copying. We would have the same problems with
 that that we had with IRC being confused with botnets.

 We already have to worry about nodes finding each other just for basic
 operation. The only addition this requires is being able to advertise
 what parts of the chain they have.

  What are the logistics of either integrating a DHT capable BitTorrent
 client,
  or just calling out to some library? We could still use the Bitcoin
 network to
  bootstrap the BitTorrent DHT.

 Using Bitcoin to bootstrap the Bittorrent DHT would probably make it
 more reliable, but then again it might cause commercial services that
 are in the business of poisoning the bittorrent DHT to target the
 Bitcoin network.

 Integration also brings up the question of network exposed attack surface.

 Seems like it would be more work than just adding the ability to add
 ranges to address messages. I think we already want to revise the
 address message format in order to have signed flags and to support
 I2P peers.


 --
 Try New Relic Now  We'll Send You this Cool Shirt
 New Relic is the only SaaS-based application performance monitoring service
 that delivers powerful full stack analytics. Optimize and monitor your
 browser, app,  servers with just a few lines of code. Try New Relic
 and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr
 ___
 Bitcoin-development mailing list
 Bitcoin-development@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/bitcoin-development

--
Try New Relic Now  We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app,  servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr___
Bitcoin-development mailing list
Bitcoin-development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-development


Re: [Bitcoin-development] Service bits for pruned nodes

2013-04-28 Thread John Dillon
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

On Mon, Apr 29, 2013 at 3:36 AM, Gregory Maxwell gmaxw...@gmail.com wrote:
 On Sun, Apr 28, 2013 at 7:57 PM, John Dillon
 john.dillon...@googlemail.com wrote:
 Have we considered just leaving that problem to a different protocol such as
 BitTorrent? Offering up a few GB of storage capacity is a nice idea but it
 means we would soon have to add structure to the network to allow nodes to 
 find
 each other to actually get that data. BitTorrent already has that issue 
 thought
 through carefully with it's DHT support.

 I think this is not a great idea on a couple levels—

 Least importantly, our own experience with tracker-less torrents on
 the bootstrap files that they don't work very well in practice— and
 thats without someone trying to DOS attack it.

Unfortunate. What makes them not work out? DHT torrents seem pretty popular.

 More importantly, I think it's very important that the process of
 offering up more storage not take any more steps. The software could
 have user overridable defaults based on free disk space to make
 contributing painless. This isn't possible if it takes extra software,
 requires opening additional ports.. etc.  Also means that someone
 would have to be constantly creating new torrents, there would be
 issues with people only seeding the old ones, etc.

Now don't get me wrong, I'm not proposing we do this if it requires additional
steps or other software. I only mean if it is possible in an easy way to
integrate the BitTorrent technology into Bitcoin in an automatic fashion. Yes
part of that may have to be finding a way to re-use the existing port for
instance.

 We already have to worry about nodes finding each other just for basic
 operation. The only addition this requires is being able to advertise
 what parts of the chain they have.

Sure I guess my concern is more how do you find the specific part of the chian
you need without some structure to the network? Although I guess it may be
enough to just add that structure or depend on just walking the nodes
advertising themselves until you find what you want.

We can build this stuff incrementally I'll agree. It won't be the case that one
in a thousand nodes serve up the part of the chain you need overnight. So many
I am over engineering the solution with BitTorrent.

 Using Bitcoin to bootstrap the Bittorrent DHT would probably make it
 more reliable, but then again it might cause commercial services that
 are in the business of poisoning the bittorrent DHT to target the
 Bitcoin network.

Good point. Sadly one that may apply to the Tor network too in the future.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (GNU/Linux)

iQEcBAEBCAAGBQJRfe1LAAoJEEWCsU4mNhiPuDgIAM1zz+ohlHgz37RgToQhInRc
1tv4Fnb6uGWyb4+U6UpK24LlXMFvOJsLm2czgbBc1Iz4z4wvb1m5IGw0ubJuV4mT
GPUJhM4sNqfeKZlSWRw4Gia6Vk1jTkue+uVYvZn2vBS4SS6vYhYCC3zXIITyb2mp
7CVjcM84bTHKxIaMW1rIgmVJmfslsFdeNOp/cDVvkNl9+WvzWPeJ32BkT522p+pT
AcPVFMsEJirYrXYi8HwdtGSeiG+mv0IemTAObJNPRrpw3x04ja6qecqzM51AkQ4t
hPems5ShXM9FyDKFQNmtoC6ULpbd3CBBjsiQj0pp55epy6UC0eiUIXP8L9v0giM=
=AOj8
-END PGP SIGNATURE-

--
Try New Relic Now  We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app,  servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr
___
Bitcoin-development mailing list
Bitcoin-development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-development


Re: [Bitcoin-development] Service bits for pruned nodes

2013-04-28 Thread Peter Todd
On Mon, Apr 29, 2013 at 03:48:18AM +, John Dillon wrote:
 We can build this stuff incrementally I'll agree. It won't be the case that 
 one
 in a thousand nodes serve up the part of the chain you need overnight. So many
 I am over engineering the solution with BitTorrent.

I think that pretty much sums it up.

With the block-range served in the anounce message you just need to find
an annoucement with the right range, and at worst connect to a few more
node to get what you need. It will be a long time before the bandwidth
used for finding a node with the part of the chain that you need is a
significant fraction of the load required for downloading the data
itself.

Remember that BitTorrent's DHT is a system giving you access to tens of
petabytes worth of data. The Bitcoin blockchain on the other hand simply
can't grow more than 57GiB per year. It's a cute idea though.


Also, while we're talking about the initial download:

http://blockchainbymail.com

Lots of options out there.

-- 
'peter'[:-1]@petertodd.org


signature.asc
Description: Digital signature
--
Try New Relic Now  We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app,  servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr___
Bitcoin-development mailing list
Bitcoin-development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-development


[Bitcoin-development] Downloading blockchain

2013-04-28 Thread Jesus Cea
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

I would like to download the entire blockchain by hand (kind of), but
I can't find software for it. The closest thing I found is
Bitcoin-protocol-test-harness code, but it is two years old and
seems not to work with current bitcoin network.

Python apreciated.

Thanks for your help.

- -- 
Jesús Cea Avión _/_/  _/_/_/_/_/_/
j...@jcea.es - http://www.jcea.es/ _/_/_/_/  _/_/_/_/  _/_/
Twitter: @jcea_/_/_/_/  _/_/_/_/_/
jabber / xmpp:j...@jabber.org  _/_/  _/_/_/_/  _/_/  _/_/
Things are not so easy  _/_/  _/_/_/_/  _/_/_/_/  _/_/
My name is Dump, Core Dump   _/_/_/_/_/_/  _/_/  _/_/
El amor es poner tu felicidad en la felicidad de otro - Leibniz
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQCVAwUBUX35M5lgi5GaxT1NAQJGwQP+PtSg/cbqE87u/4axyUsOBDc/MsRB7DYx
DfeEHqOqh8uAQ/uZMzxWrCPbp53TJK888AByH3NknkiGy0HNoshHOSmy5JACFt54
fsyWDadiWBR9bu6JV4R1imR0FMzbjGuRZkc26MAsleVbho5KZBdHR1/cPQ0D174h
wu4k1yzywYg=
=Nprg
-END PGP SIGNATURE-

--
Try New Relic Now  We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app,  servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr
___
Bitcoin-development mailing list
Bitcoin-development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-development