Re: [TDF Community] [Board Discuss] LibreOffice - peer2peer collaboration bits

2024-05-06 Thread Heiko Tietze

On 05.05.24 12:51 PM, Julien Nabet wrote:
Before talking about technical architecture, is there a real will about this? 
Are there enough financial resources for this?
There is will by at least some, see 
https://community.documentfoundation.org/t/budget-request-for-a-p2p-libreoffice-project/12016


The numbers are purely random to me as long we have no idea of the scope.

And I believe it's an average "Eve" end-user scenario since users with free and 
open LibreOffice can connect to business environment such as COOL using the 
desktop or mobile application.


OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: [TDF Community] [Board Discuss] LibreOffice - peer2peer collaboration bits

2024-05-05 Thread Julien Nabet

Hello,

I'm not against the idea of a collaboration Office but who's gonna work 
on this?
It's not a simple feature where 1 or 2 benevolents can work on it for 
some months and that's all.
There are security issues, integrity of the produced documents, perhaps 
tracking who modified which parts and when, conflict resolving, etc.


Even if the feature is developped by private company, I expect 
bugtrackers would be on TDF. Dealing with these bugs will be more 
complicated:

how to reproduce easily the fact to be behind a company firewall?
how to simulate 2 or more people working on the same doc?

IMHO it required dedicated/paid persons from TDF or from some private 
companies (or both) to work on this. There may be some replacement of 
these people with the time of course, but they always must be identified 
so if there's any problem, we know how to contact and they'll respond in 
a reasonable delay (less than 1 week).
I mean, I expect the target of this feature is mainly companies and 
associations and these can be more picky.
Before talking about technical architecture, is there a real will about 
this? Are there enough financial resources for this?


Julien



Re: [TDF Community] [Board Discuss] LibreOffice - peer2peer collaboration bits

2024-05-05 Thread Heiko Tietze

On 04.05.24 12:05 PM, Thorsten Behrens wrote:

Heiko wrote:

How to connect two or more individuals? It requires routing

...

What could be achievable on TDF infrastructure?


Given what I've said above - let's try to make this completely
independent of TDF infrastructure. Either with no switching-server
at all or with something minimal that hopefully might not even need
TDF continuously maintaining a server. Note that maintenance by us
also has privacy implications, much more so than third-party-less
P2P.


Yup. At any rate, requiring any kind of centralized server
infrastructure has inevitable scalability challenges. It would still
be useful if TDF could help with bootstrapping whatever server
infrastructure will be needed, though.


I believe it's crucial for the success to have this networking open and free for 
everyone. And I cannot think of many alternatives to TDF, hosting the discovery 
and handshake around LibreOffice Technology. While we clearly should not route 
the actual content through the server for privacy and performance reason, it 
needs to be some kind of registration to allow connections at a random end-point 
and to join one and more other to this address.


Probably not a big deal to do the connection and to establish some kind of 
server-client networking, and maybe existing protocols provide all the necessary 
stuff but I wonder how well it scales. Is average customer hardware capable to 
keep track of 1M users at the same time? Or 100M?


Given all that is possible, are there non-technical implications that requires 
us to govern the process?


OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: [TDF Community] [Board Discuss] LibreOffice - peer2peer collaboration bits

2024-05-04 Thread Eyal Rozenberg

Moving the follow-up here, as Thorsten suggested.

So, I'm arguing that we can side-step the need for "discovery" in the
P2P sense. Or even more fundamentally: P2P has several meanings. There
is "peer-to-peer networking", where nodes build an ad-hoc network in
which they need such peer discovery; and there is "peer to peer" on top
of an existing network in which, as the opposite of "through a central
server". I believe our interest should be in the second kind, as I'll
elaborate below.

On 04/05/2024 13:05, Thorsten Behrens wrote:

How to connect two or more individuals? It requires routing


I would opt for a simple protocol which does not take on problems
more complex than it has to, at least for a preliminary
implementation. Specifically - the Internet routes for us.


Connecting people for peer2peer communication is conventionally called
discovery.

There's ample writeups on the topic, here's two pointers:

* quite an accessible summary: https://jsantell.com/p2p-peer-discovery/
* scientific survey paper: 
https://www.inets.rwth-aachen.de/wp-content/uploads/2022/07/service_discovery_survey.pdf




Anyway - a server would not necessarily be required. That is, P2P
connection will happen when two users want to connect; but in our
case, they have already "connected" in some other way to agree on
making the P2P connection. So I suggest, in light of my previous
point, that we assume that the two (or more) users have another,
independent means of communication over which they can send some
data for bootstrapping the LibreOffice P2P. And this could be made
easy, UI-wise, so that the user just needs to press a copy button,
and paste some string so that the other user can see it. The other
user copies the string and pastes it into an appropriate area in
their own running LO instance. Then the connection is set up.


So in a word, piggy-backing on another, existing communication
channel?


Sort of, but not exactly. Think about a (non-Internet) phone
conversation. It's a "peer-to-peer" protocol, in the sense that there
isn't some "conversation server" which is active while the two
conversants converse: You dial the other person's number and connect to
them. But this is on top of an established network, which routes things
for you, without you having to "discover" anybody or anything. Also -
the phone number is apriori magic, which in ad-hoc peer-to-peer networks
you typically don't have. You need to know the person or organization
you're calling and obtain a numeric destination handle - not through the
use of your phone for the actual phone call. Maybe you've met and
exchanged numbers, maybe you got a card or saw an ad etc.

So, the "piggy-backing" would the interpersonal/social communication
which made the two (or more) people want to collaboratively edit a
document in the first place, plus the regular IP protocol and its
network structure.



What kind of data? Basically, I assume that would be a tuple of (IP,
port number, public key). I will admit that this doesn't cater to
the case of two firewalled users; that's a situation I'm not
experienced enough in handling, but I do know there are [many
approaches](https://en.wikipedia.org/wiki/NAT_traversal) (Wikipedia)
to handling it. Some may require a third-party "switching server",
some may not. But such a server can probably be very minimal and
hopefully not even aware of what protocol it's being used to allow
connections for.


Or taking the idea one step further: re-using the other, existing
comms channel, also for all of the collaboration traffic!


Well, I'm not sure it would be beneficial to create a strong tie-in
between our comm protocol and a specific choice of how the two users
communicate otherwise. But - that is a possibility. It may make things
easier technically I suppose - especially when it comes to more-than-two
collaborators.




Heiko wrote:

What could be achievable on TDF infrastructure?


Given what I've said above - let's try to make this completely
independent of TDF infrastructure. Either with no switching-server
at all or with something minimal that hopefully might not even need
TDF continuously maintaining a server. Note that maintenance by us
also has privacy implications, much more so than third-party-less
P2P.


Yup. At any rate, requiring any kind of centralized server
infrastructure has inevitable scalability challenges. It would still
be useful if TDF could help with bootstrapping whatever server
infrastructure will be needed, though.


Agreed; let's just think of such a bootstrapping as something to avoid
if possible, even at the cost of some tradeoffs. This relates
specifically to my two points of motivation for this feature (which I'll
quote for those who follow this list but not forum): This

"1.Undermines the paradigm Microsoft, Google and others are pushing,
of your work as a user going through them, visible and data-mine-able to
them, requiring connectivity to their servers…
 2.Offers potential for more easily 

Re: [TDF Community] [Board Discuss] LibreOffice - peer2peer collaboration bits

2024-05-04 Thread Thorsten Behrens
Hi Eyal, all,

this came up on board-discuss [1], but I believe the best place to
have this discussion, is on the LibreOffice developer list. Let's
please follow-up here.

A few comments in-line -

Eyal Rozenberg wrote:
> Heiko wrote:
> > How to connect two or more individuals? It requires routing
> >
> I would opt for a simple protocol which does not take on problems
> more complex than it has to, at least for a preliminary
> implementation. Specifically - the Internet routes for us.
>
Connecting people for peer2peer communication is conventionally called
discovery.

There's ample writeups on the topic, here's two pointers:

* quite an accessible summary: https://jsantell.com/p2p-peer-discovery/
* scientific survey paper: 
https://www.inets.rwth-aachen.de/wp-content/uploads/2022/07/service_discovery_survey.pdf

> Heiko wrote:
> > given that users not want to fiddle around with ports and
> > firewalls neither to share IP addresses I presume this requires a
> > server.
> But then it would not really be P2P, would it?
> 
> Anyway - a server would not necessarily be required. That is, P2P
> connection will happen when two users want to connect; but in our
> case, they have already "connected" in some other way to agree on
> making the P2P connection. So I suggest, in light of my previous
> point, that we assume that the two (or more) users have another,
> independent means of communication over which they can send some
> data for bootstrapping the LibreOffice P2P. And this could be made
> easy, UI-wise, so that the user just needs to press a copy button,
> and paste some string so that the other user can see it. The other
> user copies the string and pastes it into an appropriate area in
> their own running LO instance. Then the connection is set up.
> 
So in a word, piggy-backing on another, existing communication
channel?

> What kind of data? Basically, I assume that would be a tuple of (IP,
> port number, public key). I will admit that this doesn't cater to
> the case of two firewalled users; that's a situation I'm not
> experienced enough in handling, but I do know there are [many
> approaches](https://en.wikipedia.org/wiki/NAT_traversal) (Wikipedia)
> to handling it. Some may require a third-party "switching server",
> some may not. But such a server can probably be very minimal and
> hopefully not even aware of what protocol it's being used to allow
> connections for.
>
Or taking the idea one step further: re-using the other, existing
comms channel, also for all of the collaboration traffic!

> Heiko wrote:
> > What could be achievable on TDF infrastructure?
> >
> Given what I've said above - let's try to make this completely
> independent of TDF infrastructure. Either with no switching-server
> at all or with something minimal that hopefully might not even need
> TDF continuously maintaining a server. Note that maintenance by us
> also has privacy implications, much more so than third-party-less
> P2P.
>
Yup. At any rate, requiring any kind of centralized server
infrastructure has inevitable scalability challenges. It would still
be useful if TDF could help with bootstrapping whatever server
infrastructure will be needed, though.

> Heiko wrote:
> > Isn’t it better to share UNO commands and parameters?
> >
> Mmm... maybe... but - what about showing the other party's cursor
> and mouse movements? You can't do that with UNO commands.
>
Starting off from Collabora Online - which is a production-ready
implementation of LibreOffice collaboration, that uses both low-level
key & mouse events, as well as UNO commands - I guess the answer is
'both'? ;)

> Heiko wrote:
> > How do we solve the situation when one participant enters text and
> > another deletes the same paragraph?
>
> It doesn't have to be a great solution, as long as it is
> consistent. i.e. if users know that two people on a laggy connection
> editing the same sentence is likely to get them making changes in
> wrong positions etc., they will naturally limit the extent to which
> they do this - like we know from Etherpad. Consistency of behavior
> and "principle of least astonishment" would be more important than
> perfect coordination/synchronization of inputs.
>
With a dedicated server, you don't even have that problem. All input
will get serialized through this instance, so there's a strict
temporal ordering for all edits. Whatever package reaches the server
first, will 'win' in an edit war. A fully distributed solution (which
is way harder to implement!) has no such strict global ordering per
se, but there's algorithms such as CRDTs[2], which guarantee eventual
consistency in all peers. But you're right, the Etherpad experience
shows that under bad network connectivity, user experience will start
to suffer. For example, all CRDTs I've looked at would always have a
'delete' operation win over other edits, on the same span of text.

> Heiko wrote:
> > Encryption and data integrity is key.
> >
> Perhaps TLS if it's a TCP-based protocol, and