Re: [TDF Community] [Board Discuss] LibreOffice - peer2peer collaboration bits
On 05.05.24 12:51 PM, Julien Nabet wrote: Before talking about technical architecture, is there a real will about this? Are there enough financial resources for this? There is will by at least some, see https://community.documentfoundation.org/t/budget-request-for-a-p2p-libreoffice-project/12016 The numbers are purely random to me as long we have no idea of the scope. And I believe it's an average "Eve" end-user scenario since users with free and open LibreOffice can connect to business environment such as COOL using the desktop or mobile application. OpenPGP_signature.asc Description: OpenPGP digital signature
Re: [TDF Community] [Board Discuss] LibreOffice - peer2peer collaboration bits
Hello, I'm not against the idea of a collaboration Office but who's gonna work on this? It's not a simple feature where 1 or 2 benevolents can work on it for some months and that's all. There are security issues, integrity of the produced documents, perhaps tracking who modified which parts and when, conflict resolving, etc. Even if the feature is developped by private company, I expect bugtrackers would be on TDF. Dealing with these bugs will be more complicated: how to reproduce easily the fact to be behind a company firewall? how to simulate 2 or more people working on the same doc? IMHO it required dedicated/paid persons from TDF or from some private companies (or both) to work on this. There may be some replacement of these people with the time of course, but they always must be identified so if there's any problem, we know how to contact and they'll respond in a reasonable delay (less than 1 week). I mean, I expect the target of this feature is mainly companies and associations and these can be more picky. Before talking about technical architecture, is there a real will about this? Are there enough financial resources for this? Julien
Re: [TDF Community] [Board Discuss] LibreOffice - peer2peer collaboration bits
On 04.05.24 12:05 PM, Thorsten Behrens wrote: Heiko wrote: How to connect two or more individuals? It requires routing ... What could be achievable on TDF infrastructure? Given what I've said above - let's try to make this completely independent of TDF infrastructure. Either with no switching-server at all or with something minimal that hopefully might not even need TDF continuously maintaining a server. Note that maintenance by us also has privacy implications, much more so than third-party-less P2P. Yup. At any rate, requiring any kind of centralized server infrastructure has inevitable scalability challenges. It would still be useful if TDF could help with bootstrapping whatever server infrastructure will be needed, though. I believe it's crucial for the success to have this networking open and free for everyone. And I cannot think of many alternatives to TDF, hosting the discovery and handshake around LibreOffice Technology. While we clearly should not route the actual content through the server for privacy and performance reason, it needs to be some kind of registration to allow connections at a random end-point and to join one and more other to this address. Probably not a big deal to do the connection and to establish some kind of server-client networking, and maybe existing protocols provide all the necessary stuff but I wonder how well it scales. Is average customer hardware capable to keep track of 1M users at the same time? Or 100M? Given all that is possible, are there non-technical implications that requires us to govern the process? OpenPGP_signature.asc Description: OpenPGP digital signature
Re: [TDF Community] [Board Discuss] LibreOffice - peer2peer collaboration bits
Moving the follow-up here, as Thorsten suggested. So, I'm arguing that we can side-step the need for "discovery" in the P2P sense. Or even more fundamentally: P2P has several meanings. There is "peer-to-peer networking", where nodes build an ad-hoc network in which they need such peer discovery; and there is "peer to peer" on top of an existing network in which, as the opposite of "through a central server". I believe our interest should be in the second kind, as I'll elaborate below. On 04/05/2024 13:05, Thorsten Behrens wrote: How to connect two or more individuals? It requires routing I would opt for a simple protocol which does not take on problems more complex than it has to, at least for a preliminary implementation. Specifically - the Internet routes for us. Connecting people for peer2peer communication is conventionally called discovery. There's ample writeups on the topic, here's two pointers: * quite an accessible summary: https://jsantell.com/p2p-peer-discovery/ * scientific survey paper: https://www.inets.rwth-aachen.de/wp-content/uploads/2022/07/service_discovery_survey.pdf Anyway - a server would not necessarily be required. That is, P2P connection will happen when two users want to connect; but in our case, they have already "connected" in some other way to agree on making the P2P connection. So I suggest, in light of my previous point, that we assume that the two (or more) users have another, independent means of communication over which they can send some data for bootstrapping the LibreOffice P2P. And this could be made easy, UI-wise, so that the user just needs to press a copy button, and paste some string so that the other user can see it. The other user copies the string and pastes it into an appropriate area in their own running LO instance. Then the connection is set up. So in a word, piggy-backing on another, existing communication channel? Sort of, but not exactly. Think about a (non-Internet) phone conversation. It's a "peer-to-peer" protocol, in the sense that there isn't some "conversation server" which is active while the two conversants converse: You dial the other person's number and connect to them. But this is on top of an established network, which routes things for you, without you having to "discover" anybody or anything. Also - the phone number is apriori magic, which in ad-hoc peer-to-peer networks you typically don't have. You need to know the person or organization you're calling and obtain a numeric destination handle - not through the use of your phone for the actual phone call. Maybe you've met and exchanged numbers, maybe you got a card or saw an ad etc. So, the "piggy-backing" would the interpersonal/social communication which made the two (or more) people want to collaboratively edit a document in the first place, plus the regular IP protocol and its network structure. What kind of data? Basically, I assume that would be a tuple of (IP, port number, public key). I will admit that this doesn't cater to the case of two firewalled users; that's a situation I'm not experienced enough in handling, but I do know there are [many approaches](https://en.wikipedia.org/wiki/NAT_traversal) (Wikipedia) to handling it. Some may require a third-party "switching server", some may not. But such a server can probably be very minimal and hopefully not even aware of what protocol it's being used to allow connections for. Or taking the idea one step further: re-using the other, existing comms channel, also for all of the collaboration traffic! Well, I'm not sure it would be beneficial to create a strong tie-in between our comm protocol and a specific choice of how the two users communicate otherwise. But - that is a possibility. It may make things easier technically I suppose - especially when it comes to more-than-two collaborators. Heiko wrote: What could be achievable on TDF infrastructure? Given what I've said above - let's try to make this completely independent of TDF infrastructure. Either with no switching-server at all or with something minimal that hopefully might not even need TDF continuously maintaining a server. Note that maintenance by us also has privacy implications, much more so than third-party-less P2P. Yup. At any rate, requiring any kind of centralized server infrastructure has inevitable scalability challenges. It would still be useful if TDF could help with bootstrapping whatever server infrastructure will be needed, though. Agreed; let's just think of such a bootstrapping as something to avoid if possible, even at the cost of some tradeoffs. This relates specifically to my two points of motivation for this feature (which I'll quote for those who follow this list but not forum): This "1.Undermines the paradigm Microsoft, Google and others are pushing, of your work as a user going through them, visible and data-mine-able to them, requiring connectivity to their servers… 2.Offers potential for more easily
Re: [TDF Community] [Board Discuss] LibreOffice - peer2peer collaboration bits
Hi Eyal, all, this came up on board-discuss [1], but I believe the best place to have this discussion, is on the LibreOffice developer list. Let's please follow-up here. A few comments in-line - Eyal Rozenberg wrote: > Heiko wrote: > > How to connect two or more individuals? It requires routing > > > I would opt for a simple protocol which does not take on problems > more complex than it has to, at least for a preliminary > implementation. Specifically - the Internet routes for us. > Connecting people for peer2peer communication is conventionally called discovery. There's ample writeups on the topic, here's two pointers: * quite an accessible summary: https://jsantell.com/p2p-peer-discovery/ * scientific survey paper: https://www.inets.rwth-aachen.de/wp-content/uploads/2022/07/service_discovery_survey.pdf > Heiko wrote: > > given that users not want to fiddle around with ports and > > firewalls neither to share IP addresses I presume this requires a > > server. > But then it would not really be P2P, would it? > > Anyway - a server would not necessarily be required. That is, P2P > connection will happen when two users want to connect; but in our > case, they have already "connected" in some other way to agree on > making the P2P connection. So I suggest, in light of my previous > point, that we assume that the two (or more) users have another, > independent means of communication over which they can send some > data for bootstrapping the LibreOffice P2P. And this could be made > easy, UI-wise, so that the user just needs to press a copy button, > and paste some string so that the other user can see it. The other > user copies the string and pastes it into an appropriate area in > their own running LO instance. Then the connection is set up. > So in a word, piggy-backing on another, existing communication channel? > What kind of data? Basically, I assume that would be a tuple of (IP, > port number, public key). I will admit that this doesn't cater to > the case of two firewalled users; that's a situation I'm not > experienced enough in handling, but I do know there are [many > approaches](https://en.wikipedia.org/wiki/NAT_traversal) (Wikipedia) > to handling it. Some may require a third-party "switching server", > some may not. But such a server can probably be very minimal and > hopefully not even aware of what protocol it's being used to allow > connections for. > Or taking the idea one step further: re-using the other, existing comms channel, also for all of the collaboration traffic! > Heiko wrote: > > What could be achievable on TDF infrastructure? > > > Given what I've said above - let's try to make this completely > independent of TDF infrastructure. Either with no switching-server > at all or with something minimal that hopefully might not even need > TDF continuously maintaining a server. Note that maintenance by us > also has privacy implications, much more so than third-party-less > P2P. > Yup. At any rate, requiring any kind of centralized server infrastructure has inevitable scalability challenges. It would still be useful if TDF could help with bootstrapping whatever server infrastructure will be needed, though. > Heiko wrote: > > Isn’t it better to share UNO commands and parameters? > > > Mmm... maybe... but - what about showing the other party's cursor > and mouse movements? You can't do that with UNO commands. > Starting off from Collabora Online - which is a production-ready implementation of LibreOffice collaboration, that uses both low-level key & mouse events, as well as UNO commands - I guess the answer is 'both'? ;) > Heiko wrote: > > How do we solve the situation when one participant enters text and > > another deletes the same paragraph? > > It doesn't have to be a great solution, as long as it is > consistent. i.e. if users know that two people on a laggy connection > editing the same sentence is likely to get them making changes in > wrong positions etc., they will naturally limit the extent to which > they do this - like we know from Etherpad. Consistency of behavior > and "principle of least astonishment" would be more important than > perfect coordination/synchronization of inputs. > With a dedicated server, you don't even have that problem. All input will get serialized through this instance, so there's a strict temporal ordering for all edits. Whatever package reaches the server first, will 'win' in an edit war. A fully distributed solution (which is way harder to implement!) has no such strict global ordering per se, but there's algorithms such as CRDTs[2], which guarantee eventual consistency in all peers. But you're right, the Etherpad experience shows that under bad network connectivity, user experience will start to suffer. For example, all CRDTs I've looked at would always have a 'delete' operation win over other edits, on the same span of text. > Heiko wrote: > > Encryption and data integrity is key. > > > Perhaps TLS if it's a TCP-based protocol, and