Hi Rodrigo, > Dillo has a mechanism to read chunks of data from different sources as they > are arriving and pass them to the next stage for processing. However, AFAIK > it always reads a chunk and appends it to a large buffer. It doesn't free the > processed part until is done with the whole thing. > > This would require a change in the way Dillo processes data, but I think it > would be required for large files. There are more details in the > devdoc/CCCwork.txt file and in src/chain.c if you want to take a closer look. > > As I'm planning to change the design of the CCC, I think I can take this into > account too so it would be doable. I'll add it to the list of shortcomings of > the current design.
Thank you. I am still unfamiliar with that part of Dillo, so please let me know about any progress. > Okay, I'll focus on the boundary patch first, which is the easiest to merge > and then I'll take a closer look at the others. > > Yeah, I would assume a lot of implementations are broken, so we want to try > to minimize the chances we run into problems. Limiting ourselves to a-z, A-Z and 0-9 would still account for 62 out of the 75 possible characters, so roughly 82% of the set. I think that removing the quoting in favour of the limited set reduce the risk for broken implementations, yet still provide a good amount of randomness. > Check sizeof " ": https://godbolt.org/z/7Tso8ooYz Interestingly, the " " character on your last email is not really a <space> (<U0020>): $ printf "%s" " " | hd 00000000 e2 80 88 |...| 00000003 Compared to an ASCII whitespace: $ printf "%s" " " | hd 00000000 20 | | 00000001 Both Godbolt and my editor also flag that multi-byte character with a yellow rectangle around it because it would be highly confusing otherwise. For example: printf("len=%zu\n", strlen(" ")); Confusingly returns "len=3". I am not sure whether this was an intentional modification from your side. My patch is adding a <space> as defined by POSIX.1-2017 [1], so that sizeof " " would always return 2. Was it your intention to flag this potential confusion? Also, there was not strict reason to use sizeof " ". Any other character would do e.g.: sizeof "x", sizeof "A", etc. > You can also use dStr_append_c() to only append one character, so you only > need a single character. That would be an unnecessary use of the heap, because the size is static. > If we only use alphanumeric characters, we can just use isalnum() right? According to POSIX.1-2017 [2], isalnum(3) depends on the current locale configured by the system. For example, characters such as Ä or ú could return non-zero. To avoid this, there are two possible solutions: 1. Use isalnum_l(3) to specify a locale_t object corresponding to the "POSIX" locale (equivalent to "C" [3]), which must be previously allocated by the newlocale(3) function [3] and released by the freelocal(3) function [4]. A minimalist example is shown below: locale_t l = newlocale(LC_CTYPE, "POSIX", NULL); for (unsigned char i = 0; i < 255; i++) printf("hhu=%hhu, c=%c, isalnum=%d\n", i, i, isalnum_l(i, l)); freelocale(l); 2. Define a known subset from the portable character set defined by POSIX.1-2017 [5] and use strspn(3), as already suggested by the patch. IMHO this approach is better because: - It does not deal with locales, so developers not familiar with them would understand the code better. - It is also portable outside a POSIX environment (not sure if this a requirement, though). - It does not require dynamic allication via newlocale(3). - It is the only possible option if non-alnum characters, such as ':' or '/', are appended to the boundary string. [1]: https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap06.html [2]: https://pubs.opengroup.org/onlinepubs/9699919799/functions/isalnum.html [3]: https://pubs.opengroup.org/onlinepubs/9699919799/functions/newlocale.html [4]: https://pubs.opengroup.org/onlinepubs/9699919799/functions/freelocale.html [5]: https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap06.html > I meant when is the next KoVoꓘ concert :-) No gigs ahead, but I will keep you informed. :) Best regards, Xavi On 28/8/24 22:47, Rodrigo Arias wrote: > Hi Xavier, > > On Wed, Aug 28, 2024 at 01:04:04AM +0200, Xavier Del Campo Romero wrote: >> Hi Rodrigo, >> >>> Glad to read that you also consider Dillo for slcl, and thanks for >>> preparing the patches :-) >> >> Thank you! I want slcl to be useful to anyone, including users who care >> about minimalist software like Dillo. The web is already too crowded >> with bloated "webapps" and other terrible things. :) > > Agreed! > >>> Sounds good, not sure how complicated it would be to do this. >> >> I still need to investigate this further, but I assume this would >> require Dillo to at least implement a sink callback. >> >> In other words, the component responsible for transmitting the data >> (probably src/IO/IO.c) should trigger a user-defined callback with an >> arbitrarily-sized buffer (typically, of BUFSIZ bytes, as defined by >> stdio.h) that must filled with file data. Then, the user-defined >> callback can fill from zero up to BUFSIZ bytes, which are eventually >> trasmitted to the server. > > Dillo has a mechanism to read chunks of data from different sources as > they are arriving and pass them to the next stage for processing. > However, AFAIK it always reads a chunk and appends it to a large buffer. > It doesn't free the processed part until is done with the whole thing. > > This would require a change in the way Dillo processes data, but I think > it would be required for large files. There are more details in the > devdoc/CCCwork.txt file and in src/chain.c if you want to take a closer > look. > > As I'm planning to change the design of the CCC, I think I can take this > into account too so it would be doable. I'll add it to the list of > shortcomings of the current design. > >> That said, I am still not sure how much actual effort this would take. >> But I am glad to receive positive feedback so far - I will then continue >> to find a solution. >> >>> However, being able to upload multiple files at the same time sounds >>> reasonable, so feel free to try on your own in the meanwhile. >> >> Uploading multiple files at once seems doable - the patches I sent on my >> previous email are probably already doing most of the required work. >> Again, the trickiest task is to send data on-the-fly for each selected >> file. > > Okay, I'll focus on the boundary patch first, which is the easiest to > merge and then I'll take a closer look at the others. > >> >>> Shouldn't it be 68 then? >> >> I understand the opposite: the boundary string with the two leading >> dashes ("--") included can be up to 72 bytes long, and 74 bytes long for >> the ending boundary (which includes two more dashes after the boundary >> string). This is confirmed by reading the BNF defined by RFC 2046 (some >> bits omitted for simplicity), section 5.1.1 [1]: >> >>> boundary := 0*69<bchars> bcharsnospace >>> bchars := bcharsnospace / " " >>> bcharsnospace := DIGIT / ALPHA / "'" / "(" / ")" / >>> "+" / "_" / "," / "-" / "." / >>> "/" / ":" / "=" / "?" >>> dash-boundary := "--" boundary >>> ; boundary taken from the value of >>> ; boundary parameter of the >>> ; Content-Type field> multipart-body := >>> [preamble CRLF] >>> dash-boundary transport-padding CRLF >>> body-part *encapsulation >>> close-delimiter transport-padding >>> [CRLF epilogue] >>> delimiter := CRLF dash-boundary >>> close-delimiter := delimiter "--" > > Oh right! I see that we are already using 70 characters anyway. > >> Note: even if the specification tells receivers to handle transport >> padding, for the time being I am assuming "transport-padding" as zero >> length since composers must not generate non-zero length transport >> padding. I am still not sure where transport padding would apply, >> anyway. Probably outside web browsers? >> >>> I would leave out all the symbols to avoid quoting and only use A-Z >>> a-z and 0-9. >> >> Interestingly, Dillo would always quote boundary strings [2], even if >> only using A-Z, a-z and 0-9. In fact, this is one of the wrong >> assumptions I spotted when testing slcl against Dillo. > > Yeah, I would assume a lot of implementations are broken, so we want to > try to minimize the chances we run into problems. > > Apart from slcl we should also test this with some sites and see if they > continue to work okay. > > This will also increase the fingerprinting information to distinguish > Dillo among other browsers, but I think it is not more information that > the already leaked by the user agent. > >>> Which, if I computed it correctly, is still too small to worry about. >> >> Not only it is too small of a chance: if we really wanted to do "the >> right thing" and make Dillo absolutely sure the boundary string is not >> contained within the selected files, this would imply a noticeable >> performance impact when dealing with large files, much likely for a >> near-zero benefit. >> >> I have not inspected their source code yet (and I do not want to), but I >> understand both Gecko and Chromium are also making that assumption, >> because otherwise it would take them a lot of CPU time to upload large >> files. > > But then they would be doing such assumption with a "much larger" > probability it hits the file. > > Skipping it with 70 characters is safe for one file, but also probably > safe for all files ever uploaded with Dillo. > > Maybe curl or other small codebases are easier to read, but not really > needed. > >> >>> Why sizeof " " instead of just 2? >> >> Because, to my eyes, sizeof " " has more meaningful semantics, compared >> to a magic integer constant such as 2. However, for this simple >> scenario, I would still consider both acceptable. > > Check sizeof " ": https://godbolt.org/z/7Tso8ooYz > > You can also use dStr_append_c() to only append one character, so you > only need a single character. > > If we only use alphanumeric characters, we can just use isalnum() right? > >> I can replace it with 2 if you find the other construct unacceptable. >> >>> PS: When are you playing? >> >> Sorry, I did not understand your last sentence. Could you please give a >> bit more context? :) > > I meant when is the next KoVoꓘ concert :-) > > Best, > Rodrigo. > _______________________________________________ > Dillo-dev mailing list -- dillo-dev@mailman3.com > To unsubscribe send an email to dillo-dev-le...@mailman3.com
OpenPGP_0x84FF3612A9BF43F2.asc
Description: OpenPGP public key
OpenPGP_signature.asc
Description: OpenPGP digital signature
_______________________________________________ Dillo-dev mailing list -- dillo-dev@mailman3.com To unsubscribe send an email to dillo-dev-le...@mailman3.com