“I think the reason is that websocket frame is divided between different TCP 
packets.”

Just a small reminder or observation that WebSocket is “message” based-protocol 
and “frames” are underlying transport layer, which is based on TCP packets down 
the line.
WS messages can be delivered using different WS frame partitioning, and for 
example, “Hello beautiful world!” text message can be delivered like:


Frame (“Hello beautiful world!”)  -1 frame or
Frame(“He”) + Frame (“llo beauti”) + Frame(“beautiful wor”) + Frame(“ld!”) – 4 
frames

It is up to WebSocket server implementation how it partitions its messages and 
in the second case, full WS frames contained partial pieces of the whole 
message.

And if some text/binary WS message are very large, WS server can split it into 
multiple parts delivered via WS frames which may be not aligned with some 
internal boundaries like ends of words or JSON objects.
In other words, “full frame” mechanism in libcurl covers only a very specific 
case – when one WS message uses only on WS Frame (no matter how huge it is), 
which is not the case for all WS server implementations.

When we were discussing here in the past future WebSocket support in libcurl, I 
mentioned that after implementing support for “raw” and “frame” WS level, 
eventually libcurl should provide the “message” level as well.

The WS message level should assemble incoming WS frames into messages on the 
fly (and handle such cases like Control messages received in the middle of 
large Text/Binary messages)

and provide both “streaming” interface with some kind of “write functions”, 
which would allow to handle very large WS messages without blowing up the 
memory,
and the “buffer” mode, when incoming message is stored in the message buffer 
(with the option to specify its size) and delivered to the client in one piece.

And in the “buffer” mode, if the message is too large for to keep it in the 
buffer, it should trigger “Too Large” WS error as WebSocket protocol prescribes.

So, if we are talking about roadmap for future WebSocket features, I think that 
the “message” level support and implementation should be the next step.
I have such approach in my C++ implementation of WebSocket protocol using 
libcurl, and it works well with different WS server implementations and can 
handle huge WS messages even on embedded devices.

Thanks,
Dmitry Karpov


From: curl-library <curl-library-boun...@lists.haxx.se> On Behalf Of Timothe 
Litt via curl-library
Sent: Friday, February 3, 2023 6:36 AM
To: curl-library@lists.haxx.se
Cc: Timothe Litt <l...@acm.org>
Subject: [EXTERNAL] Re: WebSocket feature request: is it possible to call write 
function when full frame is loaded only?


On 03-Feb-23 03:27, Daniel Stenberg via curl-library wrote:
On Fri, 3 Feb 2023, Vitalii B. Avramenko via curl-library wrote:


 Such partial data may be OK for HTTP protocol when we know for sure that we 
have "request/response" pattern and we can detect the end of data by HTTP 
protocol itself, for example, with `Content-Length` header. But with websocket 
generally speaking we don't have any way to know where is end of frame with 
`CURLOPT_WRITEFUNCTION`.

Yes we do: curl_ws_meta() is provided to give you exactly that information!


we need a guarantee that `CURLOPT_WRITEFUNCTION` will call our callback when 
full frame is downloaded only, or at least we need the option that will allow 
us to request such behavior (something like 
`CURLOPT_WEBSOCKET_FULL_FRAMES_ONLY`).

I have been thinking about adding a mode for the websocket API that delivers 
full frames only, but I have hesitated a bit since frames can be up to 2^63 
bytes big we need to decide on how to handle (too) big frames for such a mode.

What do you think is a reasonable behavior for a full-frame mode when it 
receives (ridiculously) large frames?

There's always an upper bound - no one has 2^63 bytes of swap space, memory, or 
disk space to store an extremely large frame.  And it's not likely in the 
foreseeable future.

I think it's up to the application to decide what it's willing to handle.   I 
don't think there's a universal answer of how.  Maybe it calls getrlimit 
(RLIMIT_DATA) - or RLIMIT_FSIZE.  Or it looks at free space on its output disk. 
 Or bases it on estimated processing time.  Or ...

For full frames, if you can't set an upper bound, your protocol user needs to 
rethink its usage.  If your application really can deal with huge (beyond 
practical VM sized) data, it pretty much has to handle in in a stream - so 
FULL_FRAMES would be inappropriate.

So, here's a simple answer:  Provide a setting for the maximum acceptable full 
frame size.    On a FULL_FRAMES_ONLY connection, curl buffers any frame up to 
that size and provides it in the callback.  Anything bigger (or curl can't 
allocate the buffer memory, times out waiting for it, etc) and curl returns an 
error (FRAME_TOO_BIG), aborts the connection and calls writefunction with  NULL 
in the *ptr argument, and the actual size in 'size".

This provides the application with sufficient information to log the failure or 
even retry the request.

And to simplify the API, perhaps the setting should be 
"CURLOPT_WEBSOCKET_FULL_FRAMES_UPTO, <size>", and let zero be the current 
incremental delivery mode.

Vitalii can set <size> to a few GB if he can handle it.  Or if he is willing to 
go until the OOM killer hits him, he can set size to 2^63-1 and see where fate 
takes him.  Having lived thru "32 bits is so big that limits aren't necessary", 
I don't think that's a wise approach...
Timothe Litt

ACM Distinguished Engineer

--------------------------

This communication may not represent the ACM or my employer's views,

if any, on the matters discussed.
-- 
Unsubscribe: https://lists.haxx.se/listinfo/curl-library
Etiquette:   https://curl.se/mail/etiquette.html

Reply via email to