Good morning Bastien,

>
> I think that the main points that make routing in the Lightning Network 
> different from game path-finding algorithms are:
>
> -   Paths are consumed by payments, effectively moving available balance 
> between sides of a channel
> -   The routing algorithm doesn't know remote channels balance allocation 
> (and that changes constantly)
> -   The cost of a path depends on the value you're sending (proportional fees)
> -
> -   This encourages algorithms not to search for an optimal solution (because 
> an optimal solution on outdated/incomplete data doesn't even make sense) but 
> rather fast and good enough solutions with retries

I believe the differences are smaller than you might initially think.

Units move around on the map and a pathfinding algorithm cannot predict how the 
*other* units owned by allied players will be, once the current units asking 
for a path have moved along the path.
i.e. the algorithm does not know how remote tiles are occupied (and that 
changes constantly)

Faster units really should be able to walk around slower units, because there 
is often a tradeoff between speed and combat effectiveness and a player asking 
a faster unit to move probably is depending on their speed.
i.e. paths can be blocked by slower units, effectively becoming slow-moving 
obstacles that need to be worked around.

And so on.

>
> There are a few technicalities that might be a problem for some of your 
> suggestions, I'm interested in your opinion on how to address them.
>
> For `permuteroute`, you mention the following pre-requisite:
>
> > the original payer is informed of which node reported the failure and which 
> > channel failed.
>
> We currently don't have a solution for reliable error reporting, as pointed 
> out in [1].
> I think making progress on this thread would be interesting and useful for 
> routing heuristics. 

We already have sufficiently-good error reporting on route failures: 
https://github.com/lightningnetwork/lightning-rfc/blob/master/04-onion-routing.md#returning-errors

> When an origin node receives an error message matching a transfer it initiated
> (i.e. it cannot return-forward the error any further) it generates the `ammag`
> and `um` keys for each hop in the route.
> It then iteratively decrypts the error message, using each hop's `ammag`
> key, and computes the HMAC, using each hop's `um` key.

The number of iterations of decryption is how distant the error-reporting node 
is from the payer.
Knowing the entire route, we can know which node reported the error.
The channel that is failing is then the channel *after* the error-reporting 
node (assuming bit `NODE` (`0x2000`) is not set in the `failure_code`: if it is 
a node-level error we should back off by one node and mark the erring node as 
unreliable).
The node reporting the error is the node that we start the limited-range search 
to "heal" the path.
`permuteroute` does not *need* better error reporting than we already have.

Of course, if a node is malingering and does not report anything, there is 
nothing we can do, but this is unavoidable anyway and does not prevent use of 
`permuteroute` in other cases either.

Indeed, the other insight here is that, if we were able to receive an error 
report from forwarding node N, this implies that every node and channel between 
us and node N is reliable.
`permuteroute` reuses this prefix, since it is known-reliable.

>
> I thought about path pre-computation and path caching, but what bothered me 
> is that the cost depends on the amount you want to send.
> When pre-computing / caching, you have to either ignore that completely 
> (which can be fine, I don't think trying to always find the most
> cost-efficient route is a reasonable goal) or take into account some kind of 
> "universal" factor that works for most amounts. How did you take
> that into account in your pre-computation experiments?

Just to be clear: I have not run any experiments.
I work on Lightning in a hobbyist capacity, am a small-time HODLer, and cannot 
even afford to run a mainnet LN node (which is why I had to ask Rene to time 
`getroute`).
I mostly get by on sheer code review, tons of armchair reasoning, and sheer 
force of will.

While costs depend on the amount you want to send, we observe that there are 
three main costs:

* Risk of locking up funds for `cltv_delta` blocks
* `fee_proportional_millionths`
* `fee_base_msatoshi`

Of these, the first two are proportional to the value being sent.
So if you double the value, you double the cost, but you also double this same 
cost on *every* alternate path.
And pathfinding algorithms do not judge the absolute cost, but the relative 
cost of every path.
In short the cases below are equivalent and given the same map and the same 
source and destination, will result in the same path (assuming your variables 
do not overflow, of course):

* Plains cost 1 movement point, Forests cost 2 movement point
* Plains cost 2 movement point, Forests cost 4 movement point
* Plains cost 3241 movement point, Forests cost 6482 movement point

The issue is not so much that costs are proportional to the value being sent.
The *real* issue is that costs are *both* fixed and proportional.
So we need to select a balancing factor between the fixed and proportional 
costs.

We can assume "past performance is an indicator of future performance" and 
record the average payment size of the user in order to determine how to 
balance the fixed and proportional costs.
Picking an example value of say 1mBTC at the start, when the user has not used 
the node yet, seems reasonable.

Using the average value here, assuming the distribution of values the user 
sends is the same in the future, minimizes the error between the cached result 
vs the actual resulting fees.

Again, the point is that we need some sort of way to set limits on the fees and 
risk the user has for payments, hence the need for `maxfeepercent` and 
`maxdelay`.
We cannot reliably get perfect paths on potentially-stale data anyway.
So I think we can just use whatever path we find using precomputation and 
caching (using some "example value"), and then do a double-check that the 
generated path does not get past `maxfeepercent` and `maxdelay`.
If the generated path gets past the limits, we fall back to a non-precomputed 
search: given a good-enough "example value" this should be rare in practice.

>
> I do agree that multi-part payments and trampoline (hierarchical routing) can 
> offer a lot of room for algorithmic improvements and your 
> ideas on how to leverage them resonate with mine.
>
> An interesting thing to note is that trampoline (in the current proposal at 
> least) allows trampoline nodes to leverage multi-part payments
> "at each hop", meaning that a trampoline node can join/split arbitrarily an 
> incoming payment to reach the next trampoline node.

I agree, this is an interesting thing.

> While implementing a first version of multi-part payments, I realized that 
> they need to be tightly integrated to the routing algorithm.
> Since each payment "consumes" a path, potentially "stealing" it from other 
> payments, a naive implementation of multi-part payments
>  would try to use different paths for each sub-payment, but that's an 
> inefficient way of solving it. Working on multi-part payments made
> me think that maybe our routing problem is more similar to a circulation or 
> network flow problem [2] rather than path-finding. Have you
> thought about this? If so what is your opinion?

Most solutions to the network flow problem seem to require an accurate view of 
flows at each node, which we do not have.
For multi-part, this is actually similar to the issue of sending a blob of 
units from one location to another, while keeping the units in a cohesive blob 
without them forming a line of units where everyone walks nearly the exact same 
path.
(older RTSs tend to form these lines when sending blobs of units on 
long-distance trips, with the downside that on reaching the combat location 
units come to battle one at a time rather than all at once, reducing the impact 
of the blob; players learned to micromanage these lines near the combat 
location so that at least the first entry into the attack is a small group of 
units rather than a solitary one.)
Walking nearly the exact same path is, incidentally, the thing we want to avoid 
in multi-part payments, incidentally, so we have a similar problem as RTS games 
with lines of units going into battle one-by-one.

Hence, why I proposed the use of flocking, which is a technique used to retain 
cohesion of a blob of units.
For example, some RTS games have a concept of putting their units "in 
formation", which is actually just a way to excuse the flocking behavior to the 
player.

Another solution is to use `permuteoute`.
Run normal single-pathfinding algo.
Find the smallest-capacity channels in the returned route and `permuteroute` 
around those channels, resulting in an alternate route that avoids the 
smallest-capacity channels (which are more likely to fail when multiple 
payments run through them) but shares the rest of the path with the original 
route.
Keep doing this until `permuteroute` fails, then we have a bunch of alternate 
routes we can try to use for multi-part routing.

Regards,
ZmnSCPxj
_______________________________________________
Lightning-dev mailing list
Lightning-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev

Reply via email to