Ok, this is kinda long. I rewrote about 800 lines of code, in 11 different 
files, and pushed, and hope it didn't break anything. Here is the roadmap.

1. It would be nice if we could fetch all the javascript files from the 
internet in parallel, and the css files too. I'm sure other browsers do that. 
It would speed things up.
Sometimes there are 10 or more of these files to fetch.
We already spin off processes to download files in the background, so you'd 
think the machinery is mostly there.
It's unix only, but I don't care, I don't think I have a single windows user at 
this point.
I could fork off and download to a temp file and when done read that temp file 
into the js <script> object and off we go.
But it seems less than ideal. There's just a gut feeling that threads would be 
better.

2. When a script is marked async, it can run asynchronously.
I don't think we're ever going to do that. I need a team of engineers for that, 
not just my spare time.
but ... async can mean postpone.
It means the browse can finish and you can start looking at the file while that 
script runs.
I can just put it on a timer.
Like in 10 seconds go ahead and run the async script.
Ok but in ten seconds you may find yourself locked out for a few seconds while 
the script runs, which is weird.
And if that script does an xhr, and the internet is slow, then you're blocked 
for 20 seconds.
Well, what if that postponed script, or perhaps timers in general, ran in 
another thread while you look around?
Now here's the thing, js + my dom + edbrowse will never be threadsafe.
So if a js timer or script is running over there, you can't run js in the 
foreground over here.
It runs here or there, not both.
If you try, like clicking on a button or doing anything that involves js, then 
I have to stop and block wait for the other thread to finish.
But this is not likely, and still better I think that doing all that stuff 
during browse and you have to wait for it even to see the page.
A lot of these async scripts are google analytics or google ads etc, that we 
just don't care about,
so let them run when they run, and they might update some lines on the page, 
filling in the ads, which you can look at if you care, or not.

3. The key to parallel downloads is a threadsafe curl system, not just curl 
itself but all the machinery we built around it.
I'm not expecting to run js in parallel in two separate threads, but we need to 
run curl in two separate threads, or in 10 separate threads.
So where does that leave us?

4. There is the basics of running curl in a threadsafe fashion, and maybe Chris 
can help me with this.
I know you have a 9 to 5, but maybe you remember reading about it and can 
determine if I have to do something differen, or special,
or if it's just well behaved already.
And what about the standard calls like stdio and such, are they threadsafe?
Remember that each thread could be dumping data to a common file in debug mode.

5. What about the framework around curl, primarily httpConnect?
Good lord there are about 20 static variables at the top of http.c that record 
values and states and the like as we step through the http fetch.
I mean it's as far away from threadsafe as the moon!

"Did you work at your regular job, then at 10 oclock at night just hammer this 
thing together as fast as you could, just to make it work?"

"Yeah, I kinda did."

So this push is about cleaning up my mess, at least some of it.
You're a programmer, so you know the drill.
All those static variables become members of a structure,
struct i_get    (internet get)
If a routine calls httpConnect, and this happens more often than you might 
think, including the xhr request, it has an auto variable
        struct i_get g;
We set it up with url and some parameters, and call httpConnect(&g),
and now everything is on the stack where it belongs.
I've been doing some surfing and it seems to work, but boy there was a lot of 
rewrite!

This isn't the end. There's the caching system, the web authorization system, 
finding proxy, the novs domains, I mean it calls things all over that might not 
be threadsafe.
But some of them are only used in the foreground thread, the interactive thread 
that is you typing at the keyboard.
And the whole cache system is probably threadsafe, because I set it up with a 
locking mechanism,
assuming many edbrowse processes would be accessing the same cache.
So these things can be managed one by one I think.
And you know, even if we never do any of the things in this post, it's still a 
better, cleaner design.

Karl Dahlke
_______________________________________________
Edbrowse-dev mailing list
Edbrowse-dev@lists.the-brannons.com
http://lists.the-brannons.com/mailman/listinfo/edbrowse-dev

Reply via email to