Hi Xavier,
On Mon, Aug 26, 2024 at 01:27:29AM +0200, Xavier Del Campo Romero wrote:
Hello Dillo dev community,
I am testing support among web browsers for slcl [1], a JS-less
minimalist web storage solution. slcl relies on
"multipart/form-data"-encoded HTTP requests to upload files to a server.
Whereas Dillo helped me to uncover a few wrong assumptions on my code, I
have realised a few issues on Dillo itself related to filue uploads that
should be considered.
Glad to read that you also consider Dillo for slcl, and thanks for
preparing the patches :-)
Issue #1:
While Dillo is fine uploading small files (up to a few MiB), things go
wrong with larger files, so much that memory usage increases up to
multiple GiB and can even lock the system up. This is because Dillo is
designed to send requests always from memory, which means file contents
must be dumped into memory first, and this might be unfeasible for large
files.
Suggestion:
Instead, Dillo should ideally send file contents on-the-fly, so that
memory usage is kept to a minimum regardless the file size.
Sounds good, not sure how complicated it would be to do this.
Issue #2:
Even if Dillo generates a 70-byte, random boundary string (yet mostly
filled with '-', similarly to Firefox [2]), it ensures it is not found
anywhere inside the file contents. Again, this can be a serious
bottleneck in the case of large files, as it requires to scan the whole
file for a match.
Suggestion:
Define *all* of the 70 bytes in the boundary string as random, and
assume they would never be found inside a file. The chance of accidental
collision is so low that it is not worth the effort into checking them.
Suggested patches:
- 0001-dialog.cc-Generate-more-random-boundaries.patch)
Yes, I think is a good idea.
@@ -1246,6 +1246,24 @@ Dstr
*DilloHtmlForm::buildQueryData(DilloHtmlInput
*active_submit)
return DataStr;
}
+static void generate_boundary(Dstr *boundary)
+{
+ for (int i = 0; i < 70; i++) {
I think this is too long:
Boundary delimiters must not appear within the encapsulated material,
and must be no longer than 70 characters, not counting the two
leading hyphens.
Shouldn't it be 68 then?
+ /* Extracted from RFC 2046, section 5.1.1. */
+ static const char set[] = "abcdefghijklmnopqrstuvwxyz"
+ "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
+ "0123456789"
+ "'()+_,-./:=? ";
I would leave out all the symbols to avoid quoting and only use A-Z a-z
and 0-9.
Assuming you are left with 62 symbols (26*2 + 10), the probability of
finding a random 70 character string in a random sample of 70 characters
is (using ** as ^):
p = 1/62**70 = 3.4086e-126
But as you will be dealing with large files, each line of the input
could match it. We are interested in the probability that a given
boundary will mach any of the N lines. So we compute the probability
that it will match none of the lines (1-p)**N and then the opposite of
that (matches at least one line):
1 - (1 - p) ** N
For example, a 1 TB encoded file, with 2**40/70 lines will have:
p = 1 - (1 - 1/62**70) ** (2**40 / 70)
Of occurring, which is around 5.354e-116.
Which, if I computed it correctly, is still too small to worry about.
+ char s[sizeof " "] = {0};
Why sizeof " " instead of just 2?
+
+ do {
+ *s = rand();
+ } while (!strspn(s, set));
+
+ dStr_append(boundary, s);
+ }
+}
+
/**
* Generate a boundary string for use in separating the parts of a
* multipart/form-data submission.
Issue #3:
Dillo only supports uploading 1 file at a time. This is mostly because
it relies (probably temporarily?) on the a_Dialog_save_file function
[3]. However, this is not a limitation on the HTTP protocol, and other
implementation such as Firefox or Chromium-based browsers support this.
Suggested patches:
- 0002-dialog-Add-a_Dialog_select_files.patch
- 0003-WIP-multi-file-uploads.patch
Conclusions:
Dillo seems designed to always send requests from memory, so it is not
straightforward to break this assumption in order to support large file
uploads. 0003-WIP-multi-file-uploads.patch is an incomplete first step
into fixing this, but it surely needs deeper design changes.
I did not put more effort into these patches for the time being because,
after seeing the potential complexity behind this task, I thought it was
a better idea to ask the community for feedback and guidelines.
I'm not very familiar with this part of Dillo, so I'll have to check a
bit more before providing more feedback.
However, being able to upload multiple files at the same time sounds
reasonable, so feel free to try on your own in the meanwhile.
PS: When are you playing?
Best,
Rodrigo.
_______________________________________________
Dillo-dev mailing list -- dillo-dev@mailman3.com
To unsubscribe send an email to dillo-dev-le...@mailman3.com