Re: [Lightning-dev] Jamming Mitigation Dry Run

Elias Rohrer Tue, 08 Aug 2023 07:34:45 -0700

Hi Vincent,


On 6 Aug 2023, at 21:35, Vincenzo Palazzo wrote:

I do not see the privacy concern here for two main reasons:
The dray run will be done on a selected node (who wants to run it), soI am assuming that with my research node, I am not buying my house (ormaybe a coffee).

I'm confused: wasn't the idea to collect data from real-world forwardingnodes rather than creating yet another synthetic/research data set?

The only leak of privacy that I see is for the following fields, butthey are meaningless from the analysis point of view so that we canfake is (I will do this in core lightning) you need just to make surethat the fake channel id is always the same, right?
* channel_in (uint64)[P]: the short channel ID of the incomingchannel
that forwarded the HLTC.

* channel_out (uint64)[P]: the short channel ID of the outgoing
channel that forwarded the HTLC.

* peer_in (hex string)[P]: the hex encoded pubkey of the remote peer
for the channel_in.

* peer_out (hex_string)[P]: the hex encoded pubkey of the remote peer
for the channel_out.
I care about this point to make this research result 100% reproducible
by giving access to raw data to involve more people in verifying and
proving that we are wrong.

Due to the limitation of the data that should come from real nodes
with real bitcoin involved, we can fall in the situation that we leave
in our bubble of certainty.
I am missing something in the point that we can fake the channel_idand
the node pub key?

Sure you can obfuscate these fields, but that doesn't mean it's notpossible to re-identify node ids and channels by correlating the datasetwith publicly available data, such as the graph topology and gossipdata.

Just to throw out some ideas: you could for example assume that over asufficiently long collection period each channel of a node willeventually be used and show up in the dataset, i.e., you get a goodapproximation of the number of channels the observation point has withits neighbors. This alone might already be enough to give a good guesswhich obfuscated node id corresponds to which node in the network. If wenow use the timestamps we can further exclude any nodes/channels thatcouldn't have been used at the time the HTLC was sent from the candidateset, and, especially if we have access to datasets from neighboringnodes, we may be able to easily derive which anonymized clusterscorrespond to which real world clusters. You could find neighboringnodes by checking that all `ts_added_ns` timestamps between twocandidates are sufficiently close together (i.e., that no additional hopwould "fit in there" assuming a reasonable real-world RTT). Once we havere-identified which obfuscated nodes are which real-world nodes, wecould derive HTLC amount from the gathered fees, and can drawconclusions about the liquidities. We then even may use the HTLCresolution time delta to draw some further conclusions on thenetwork-distance of the HTLC destination. Of course, all of these areestimations, so the adversary has some error probability in there, andfuzzing the timestamps might already go a long making the adversary'slife harder.

P.S: I had some real examples in the university that I came from ofPhD program failed to start due to the lack of real data

Yes, as said before I'm very familiar with trying to do Lightningresearch in absence of real-world data sets :)

To be clear I'm not objecting this effort, just saying a) that sharingaggregated results is probably a good starting point and b) that theframework and the associated risks of the data collection should beclearly communicated beforehand to node operators considering sharingtheir data.


Best,

Elias

_______________________________________________
Lightning-dev mailing list
Lightning-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev

Re: [Lightning-dev] Jamming Mitigation Dry Run

Reply via email to