Re: [Lightning-dev] Routemap scaling (was: Just in Time Routing (JIT-Routing) and a channel rebalancing heuristic as an add on for improved routing success in BOLT 1.0)

m.a.holden via Lightning-dev Fri, 05 Apr 2019 19:42:29 -0700

Good morning ZmnSCPxj,

I'll try to clarify my proposal further, but also have some questions about 
yours.


---

> Now, it seems to me what you propose, is to have octrees contain octrees, and 
> so on.

There's one global tree, which is the same for all users. Every node in the 
tree has a bucket and exactly 4 child nodes, except leaves have no children. 
The tree has a max depth, which each client sets itself. The theoretical 
maximum being HASHLEN/2 (In practice, we're likely to be concerned about <8). 
Note that every parent's bucket contains all of the information of all of its 
children's buckets - meaning the root node of the global quadtree is equivalent 
to the unfiltered global network. Nodes pick a depth and concern themselves 
with the bucket at that depth, unless it overflows, in which case they increase 
the depth by 1.

> Now let us return, to my proposal of a distance measurement.
> This effectively maps the LN universe onto a circle.
> Each node commits itself to storing an arc of this circle, and possibly 
> various other nodes it happens to be directly connected to that may be far 
> from the arc near it.

The quadtree can also be looked at from the perspective of a circle, assuming 
there is a reference point P on it. Each bucket is represented by an arc of 
size 2pi/(4^depth), and the PKH-prefix represents a displacement of the arc 
from P (Eg, the bucket 00b would represent an arc from P to pi/2+P). Bucket 00b 
includes all gossip information for the buckets 0000b, 0001b, 0010b and 0011b, 
which are sub-arcs of it, and so forth. Spilling a bucket is the same as 
narrowing the arc to 1/4 its previous size.

Although different from your arbitrary distance k (arc size 2*k?), they're not 
too dissimilar in the kind of distance covered - with the primary distinction 
being that mine proposes using set interval ranges (based on powers of 4), and 
the node picks the range which it's PKH fits into, rather than the range being 
centred on the node.

---

> I can overload those N nodes by generating node addresses that also lie in 
> that octant.
> By just iterating over scalars, about 1/8 of the generated node addresses 
> will lie in the target octant.

If you target the first layer of the tree only, then once nodes spill to the 
second layer, they will filter out any gossip which does not concern them. It 
isn't sufficient to just brute force the first 2 bits, but you need to do it 
for 2^depth, where depth is the target's maximum they're willing to spill to 
(which is not shared).

However, commodity hardware can attempt millions of ec_mult/hash per second, so 
getting a node into the bucket you want is trivial anyway for small depth.

> In order to disrupt a particular node, I must place fake nodes near that 
> node, in an attempt to force its neighbors to reduce the arc of the circle 
> that they can map.
> However, I need to generate fake nodes that are nearer to that node than 
> genuine honest nodes.
> This means the probability of me generating such node addresses are much 
> lower than 1/8, thus requiring that I devote more energy to generating the 
> falsified attack nodes.

I would argue that the above is also true in your approach. It would still be 
trivial to brute force the prefixes which minimize the distance between 
themselves and the target, even with a network of tens of millions of nodes. 
The amount of energy which would need devoting does not seem like it would a 
deciding factor for this attack.

> Further, in executing this attack, while I disrupt one node very well, and 
> nearby nodes somewhat, my effect will be far less than disrupting 1/8 of the 
> network.

Since the attack needs to target the maximum depth that nodes might spill to, 
then the amount of the network which could be affected by the attack would be 
1/4^depth. I can't imagine it being very different in your distance based 
approach, since we're considering the same kind of distances, from a different 
perspective.

---

> Once I get a node address in the targeted octant, I can commit a nominal 
> amount of Bitcoin into some 2-of-2 with an existing actual node I control and 
> the generated node address, then synthesize some node gossip containing those 
> node addresses.

The way I see it, this potential attack affects the global network generally. 
Somebody could try this regardless of our proposals, and try to flood the whole 
network with spam.

BOLT#7 only states that a node MAY forward information about nodes for which it 
does not know any channels. One could reconsider this approach if constrainted 
on resources, such as to say if depth>X, MUST NOT forward any information about 
nodes with no known channels, or perhaps have a variation of gossip which 
combines node information into channel broadcasts to ensure that such node spam 
can't occur. I think the vagueness of the spec on this rule is inidicative that 
it needs addressing.

To try and mount an attack with greater chance of success, the attacker should 
need to open many new channels, for which they face obvious constraints on the 
bitcoin network, and real costs. And since the attack can only attempt to 
degrade LN at best, and offers no guarantee of censoring, it seems unlikely 
that an attacker would exhaust significant resources to try and target somebody 
with a low chance of success.

The attacker would also likely use nominal amounts for the channels they're 
trying to attack with, as they would not want to lock up significant funding. 
It may be trivial to filter many tiny capacity channels as they're not likely 
to be useful for making payments anyway.

---

> This unbalances the octree such that one octant has a far lerger number of 
> nodes inside it than the other octants.
> The N nodes that promised to keep track of the routemap within the octant 
> find that they need to take up more and more memory to store the octant data 
> because of this targeted attack.
> Eventually, the N nodes start dropping out of this responsibility since they 
> run out of resources.

Each node would treat the quadtree as a perfectly balanced one at the depth 
they're concerned. The buckets themselves vary in capacity, and if the gossip 
reaches capacity at any depth, the bucket spills to the next depth. If an 
attacker managed to overflow a leaf bucket, then the potential victim would 
need to consider another gossip approach.

On determining whether one is being attacked though, it should be possible to 
detect based on information size before and after spilling. If one is being 
specifically targeted, then the information before and after spilling will be 
almost the same, with little saving on gossip, where normally spilling might 
expect a 50%+ saving. A client could set a low threshold on the minimum saving 
they expect to get from spilling, and if it is not met, they might determine 
that they are being targeted, and take countermeasures.

To me it seems like your distance narrowing suffers the same issue. As the size 
of your gossip grows, you will narrow the distance k for which you concern 
yourself with gossip - but at some point you must have a minimum k, else it 
will be too narrow to have any useful information. What happens if you decrease 
k to some minimum, and the amount of spam still overflows the capacity limits 
you've set?

> Once the number of nodes that hold that octant drops, I can then switch to 
> targeted DDoS attacks on the remaining octant-mapping nodes, especially easy 
> to locate them since they openly broadcast the fact that they promise to map 
> the octant they are in.
> Now the octant becomes hard to access for 7/8 of the network.
> I have successfully executed a partitioning attack and disrupted the 
> operation of an entire octant.

There are no "octant-mapping nodes". Every node knows about every channel from 
its own bucket to other buckets. This remains true even after spilling - 
although the quantity of information is reduced. Performing a DoS on a node who 
stores the whole network information does not affect other users, because 
they're not exclusive holder of information. Every single participant of the 
network could operate at depth=2. There does not need to exist anybody who has 
the full network map.

Also, if most nodes follow a reasonable autopilot strategy, then every node in 
a bucket also has at least some channels to other buckets - meaning the target 
of a DDoS would essentially be the entire bucket to try and completely isolate 
it. Remember that I'm suggesting half (or more) of the open channels in a 
bucket are to other buckets.

A node doesn't necessarily need to broadcast the depth at which they gossip, 
but could indicate a deeper depth, so as to reduce the amount of queries made 
to it, and hide the truth about the full information they carry. They shouldn't 
indicate a smaller depth than the one at which they gossip as they'll be unable 
to answer to a majority of queries.

If a node is being targeted they could specify a large depth at which it would 
be infeasible for an attacker to target, whilst secretly maintaining 
information at a smaller depth, but by doing so, they're also stating that 
they're not going to be widely useful.

I'm also not suggesting this be the only gossip strategy. At minimum I would 
also want friend-of-a-friend-of-a-friend type gossip in addition to this, 
because those channels are the ones we'll most likely be routing payments over 
anyway, so we should have them cached separately from this quadtree proposal. 
Several other gossip approaches could be used too.

---

> Crucially no node admits to how large an arc of this circle they map out, 
> only that they are more likely to map points nearer to themselves.

> This is because nodes only commit to *probabilistically* being more likely to 
> know nodes near to them, than nodes that are not near to them.

I can see how using a probabilistic approach might mitigate the issue to some 
extent, but it doesn't seem like a cure. Since the information nearest to you 
is more proabable to be forwarded to you, an attacker could still try to 
overwhelm your resources by plaing enough nodes nearby. How is the probability 
decided? Does your node probabilistically filter incoming information, or do 
you signal to your peers to probabilisitcally filter it?

In the latter case, while not necessarily admitting what you store, it seems 
like you'll still leak hints about it. What information are you proposing gets 
communicated (if any) between peers to filter gossip being sent?

My approach is based on the idea that the broadcaster will always know whether 
or not to forward you information based on the filter you give them, with the 
default filter being a bit-mask for the depth of the quadtree matching your 
PKH. A node will forward you all information about any channel which has one or 
both nodes matching the bit mask, and both of the nodes for every channel. 
Everything else gets filtered (unless other filters or strategies are agreed). 
A node forwarding you gossip you've specifically requested to be filtered could 
be considered malicious.

The filter a node communicates with its peer does not necessarily need to 
indicate the amount of the network they actually gossip about. For example, a 
node wishing to have all information about bucket 00b might request filters for 
buckets 0000b, 0001b, 0010b and 0011b from different peers. If each peer 
forwards information for their buckets, then the node will learn of all 
information in 00b without ever revealing to anybody, at the cost of some 
duplication.

---

> They do not commit, as in your proposal, to absolutely knowing everything 
> within their committed area.

Ultimately, my proposal is a convention and not a hard rule. Any commitment 
comes with limits and it isn't to be *assumed* that a node will know everything 
about their local topology, only that they are very likely to know it under 
normal circumstances. It doesn't prevent anyone from using other techniques, or 
to try routing through different buckets (which one might want to try for 
increasing privacy).

Any feature must be considered the same. Nodes could advertize that they 
support a feature and then make no commitment to following it.

> Nodes that openly broadcast that they know the nodes within the same octant 
> they are, are nodes that want to be DDoS'ed.
> Thus, my definition of a "special" node is a lot looser than you seem to 
> define it.
> Anything that makes it possible to point to a node and say "this is a good 
> node to attack, in order to disrupt Lightning" is special.

Acknowledged, but I'm not convinced there is a DDoS motive. Disrupting specific 
nodes doesn't necessarily disrupt other users. Obviously, it can disrupt the 
friend and friend-of-a-friend of the target, but this will always be true. The 
gossip network specifically intends to publish friend-of-a-friend information, 
so targeted DDoS attacks on individuals and their friends are not necessarily 
avoidable.

---

> * No node shall reveal the extent of its knowledge of the network, since if 
> it reveals that it knows the entire network, it may become a target for 
> attack in order to degrade the network.

My main question about your proposal is, given that bandwidth is a key resource 
constraint, what strategy are you using to prevent the sending/receiving of 
unwanted gossip information, which also prevents leaking information about 
which gossip you're interested in?

>   * This also implies that if a node does not know the location of some node, 
> instead of admitting its ignorance, it should instead delegate to another 
> node that might have better information; otherwise it would be possible to 
> profile nodes to determine how much of the network they know.

I'm not sure I agree. I'd prefer to go the opposite way and always fail fast.

It could be specified that if you've indicated that you'll answer queries for a 
specific bucket, and somebody requests information about a node or channel 
outside of the bucket, and which there exists no direct channel from within 
your bucket to that node, then you should automatically fail the query, even if 
you know the information.

A query will then either return a very likely yes (on information within the 
bucket), or a definite no (anything not in the bucket). Nothing more than which 
was previously specified is leaked, even if you keep more gossip information 
than you have publicized. If probing can only reveal information that is 
already public, there is nothing to be gained.

---

PS, don't take any of this to be a dismissal of your proposal because I fully 
acknowledge your perspective and concerns.
I think there are potential useful optimizations in my approach which I'm not 
sure how equivalent could be acheived with yours.
But if I am convinced there are unresolvable problems with the idea I'll drop 
it and focus on your approach.

Regards,
Mark H
_______________________________________________
Lightning-dev mailing list
[email protected]
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev

Re: [Lightning-dev] Routemap scaling (was: Just in Time Routing (JIT-Routing) and a channel rebalancing heuristic as an add on for improved routing success in BOLT 1.0)

Reply via email to