Hi Elias, I want to clarify that I didn't mean to come across as someone who was suggesting that your points were incorrect. I apologize if I gave the wrong impression.
> > I do not see the privacy concern here for two main reasons: > > The dray run will be done on a selected node (who wants to run it), so > > I am assuming that with my research node, I am not buying my house (or > > maybe a coffee). > > I'm confused: wasn't the idea to collect data from real-world forwarding > nodes rather than creating yet another synthetic/research data set? Maybe I am mistaken about what a dry-run means for me, so I'm trying to grasp Carla and Clara's proposal here (please correct me if I'm wrong). We're implementing this algorithm to test if the theory holds up in practice, selecting a couple of nodes that will enable this feature and share the data with us. I assume that we don't need many nodes for this task due to the localized nature of reputation. Each node builds its own reputation. To summarize, what I'm assuming is that this dry-run is conducted with selected nodes. Once we understand the direction to proceed, we can prioritize privacy a bit more and potentially expand the data collection over time. At this point, I agree with you that we must consider ways to protect the privacy of nodes and channels. > Just to throw out some ideas: you could for example assume that over a > sufficiently long collection period each channel of a node will > eventually be used and show up in the dataset, i.e., you get a good > approximation of the number of channels the observation point has with > its neighbors. This alone might already be enough to give a good guess > which obfuscated node id corresponds to which node in the network. If we > now use the timestamps we can further exclude any nodes/channels that > couldn't have been used at the time the HTLC was sent from the candidate > set, and, especially if we have access to datasets from neighboring > nodes, we may be able to easily derive which anonymized clusters > correspond to which real world clusters. You could find neighboring > nodes by checking that all `ts_added_ns` timestamps between two > candidates are sufficiently close together (i.e., that no additional hop > would "fit in there" assuming a reasonable real-world RTT). Once we have > re-identified which obfuscated nodes are which real-world nodes, we > could derive HTLC amount from the gathered fees, and can draw > conclusions about the liquidities. We then even may use the HTLC > resolution time delta to draw some further conclusions on the > network-distance of the HTLC destination. Of course, all of these are > estimations, so the adversary has some error probability in there, and > fuzzing the timestamps might already go a long making the adversary's > life harder. I understand your point. To me, this use case involves continuous data collection over time. For instance, I'm interested in tracking the evolution of the network for the XYZ feature. I see that reverse engineering is possible to reconstruct missing parts of the graph that a node might be falsifying. While it's not a straightforward task, it's feasible. However, my main point (and I might not have explained myself clearly, for which I apologize) is that if we run this data collection for 4 to 5 months until our next analysis, the privacy concerns might seem a bit overly complex. This is because I am assuming that we need also to trust the source of this data. > > P.S: I had some real examples in the university that I came from of > > PhD program failed to start due to the lack of real data > > Yes, as said before I'm very familiar with trying to do Lightning > research in absence of real-world data sets :) > > To be clear I'm not objecting this effort, just saying a) that sharing > aggregated results is probably a good starting point and b) that the > framework and the associated risks of the data collection should be > clearly communicated beforehand to node operators considering sharing > their data. I agree! We should convey our goals, the underlying purpose of our actions, and also address any potential drawbacks. Thanks for sharing those insightful reflection points. Vincent. _______________________________________________ Lightning-dev mailing list Lightning-dev@lists.linuxfoundation.org https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev