Re: Review: Merge branch mmu_man/netsurf-gopher-support-v2

François Revol Thu, 12 May 2011 14:36:27 -0700

Hi,

Le 10 mai 2011 à 14:32, Vincent Sanders a écrit :


> I have explained *at length* why I disagree with your approach. I
> shall briefly outline those arguemnts once more. I have already stated
> I do not wish to discuss this futher as you simply will not accept my
> arguments, this review/summary is provided as a pure courtesy.

I'll keep the branch around as reference and rewrite it on the -v2 one, but 
here is what I have to say.
I also have tried to explain the reasons why I went this route, I'll summarize 
them again below... Besides, I started it as a joke to see if it would work, so 
I originally just wanted something that work before getting things cleaned up.

I actually started with the fetcher way inside the curl fetcher, and it almost 
worked. However the data chunks sent from the server did not always end on a 
line boundary, so the simple parser missed entire items. Refusing part of the 
data by indicating less bytes consumed than presented also resulted in the 
fetch being aborted (would actually be handy if it would just keep the 
unconsumed data for the next call). Fixing it would have meant adding caching 
in the fetcher itself. I looked at how llcache could be used for this, but I 
couldn't find any way to use it without presenting data that would end up in a 
content object.

So I tried the content handler route, and was directed at the old plaintext 
renderer which used the same trick as the current code. Being able to get all 
the data at once simplifies the parser as well, though of course it is not as 
fast probably.
While it might not follow the current design goal, it does the job, which was 
the idea for a first version.

Also, concerning the fetcher vs handler way, I think putting things in the 
fetcher is as bad, if not incorrect semantically. Putting something that 
massages *content* in a *fetcher* seems wrong. While I agree it makes some 
sense for file:, gopher data has a standardized format which makes up a content 
that needs to be *handled*.
Having to dig the HTML handler function and masquerade oneself to it is just a 
result of not being able to create content objects ex nihilo from arbitrary 
data instead of llcache object tied to a url.

Besides, "fetching" html content from gopher directories has side effects like, 
one won't be able to view the source data for example. Not very important but 
still...

> We have made extensive efforts to streamline the data flow through the
> browser in the recent past, starting with jmb implementing the low
> level cache, us making the fetcher interface less troublesome and
> culminating in the ongoing content handler refactor. I say this as you
> seem to think this ongoing work is directed at you personally soemhow.

No I just happened to have pop up at the wrong time and got a bit irritated.
I understand the effort and will try to follow the same route for the v2 
branch, but I still think it's not totally logic either to put gopher 
htmlization in the "fetcher". 

> We have come to understand that the curl library simply should not be
> relied upon beyond its http/https implementations. We have implemented
> fetchers directly for things like file: explicitly. 

At least for gopher I didn't find any bug in curl.
The fact that the data comes in in chunks is likely caused by the server doing 
so.

The only thing I thought was a bug in curl was actually a bug in the server at 
floodgap.com that has been fixed since then.

While curl could have tried to map gopher errors to HTTP codes, they don't do 
it probably because it's just an heuristic of checking if the 1st char is '3' 
(error item type), which is not really telling much.

Is there anything I could use to handle the caching of partial item lines or 
will I have to write it myself btw ?

> Because the curl libraries in actual common use have numerous bugs and
> unwanted behaviours (an example would be curl applies proxy settings
> to file: urls) and the huge memory footprint of curl *in our use case*
> it looks increasingly like a poor fit.

Don't have much experience with cURL, so I can't tell. I actually didn't check 
how proxies should be used for gopher (not gopher-to-http proxies), at least 
some browser provide a setting for it, but gopher doesn't have any provision 
for vhosts...

> Because of this we have gone to great efforts to *remove* fetcher
> support from the curl library. As discussed at the develoepr meetings
> previously Daniel Silverstone is working on creating an http library
> to eventually completely remove our dependancy on curl.

http://source.netsurf-browser.org/branches/dsilvers/netsurf-remove-curl/ ?
I should probably have a look then...
If possible it should have a lower level API for doing socket I/O to reuse in 
non-http things.

Meanwhile I'll try to isolate the curl dependency in my code.

> Content
> -------
> 
> As mentioned to you, The way we have chosen for things that generate
> output (primarily directory listings but the about: handler also) that
> the browser should display as HTML is that the *fetcher* layer returns
> the data as generated HTML. We have created helper functions for this
> generation for directory listings.

Again this goes beyond the job of a *fetcher*, but anyway.

> If the formatted output was not to your liking these content
> generation routines should have been extended and the formatitng
> applied using the internal.css.

I already added things to internal.css for gopher.

> The method of inventing a content type and content handler which then
> rummages around inside a different content handler is full of
> problems. Indeed we have worked to completely remove this method of
> operation and built a data model where it is not supported.

cf. above.

>> + * Copyright 2006 James Bursa <bu...@users.sourceforge.net>
>> + * Copyright 2006 Adrian Lees <adri...@users.sourceforge.net>
>> + * Copyright 2011 François Revol <mmu_...@users.sourceforge.net>
> 
> odd you credit james and adrian in a new file?

I originally forked textplain.c as a template, though indeed I didn't keep much 
of it, and also reused stuff from dirlist.c.

>> +/* HACK HACK */
>> +extern const content_handler html_content_handler;
> 
> right there is why you have got no other reviews by the way, not just
> because it *is* a hack but because you knew it and did nothing about
> it, or at least provided a rational explanation.

That is because I was hoping some cleaner way would be introduced, like a 
content delegate object of some sort. As I said it's the result of not being 
able to create a content without an llcache backing object, which would have 
been cleaner.

>> +static const char gopher_faked_type[] = "text/x-gopher-directory";
>> +static lwc_string *gopher_faked_mime_type;
> 
> as a comment we are trying to move away from forward declaration,
> secondly every function should be doccommented somewhere. if its
> static, in its scope here. If its external a comment saying in which
> header it is docuemnted.

I originally forked textplain.c ...

>> +    { 'p', "image/png"}, /* at least on 
>> gopher://namcub.accelera-labs.com/1/pics */
>> +    { 0, NULL }
>> +};
> 
> arghhhh magic letters! no! an enum if you please

Ok... But they are not magic though, the protocol explicitely uses letters.
Will use an enum next time.

>> +
>> +/**
>> + * Create a CONTENT_GOPHER.
> 
> there are six parameters here...

Right, pure laziness, I admit.

>> +static char *html_escape_string(char *str)
>> +{
>> +}
> 
> BLETCH
> 
> is there not already a utility function to do this, and if not create
> one rather than spooging this in and really this ought to be less
> iffy.

Didn't find one at first sight, I just stripped things from gen_nice_title to 
get it done.

>> +
>> +
>> +static char *gen_nice_title(const char *path)
>> +{
> 
> didnt you just have this code above?

No, this one comes from dirlist.c, and gave birth to the one above. I agree it 
should probably be put to utils.c

>> + *
>> + *     gopher_generate_top()
>> + *     gopher_generate_title()
>> + *     gopher_generate_row()           -- call 'n' times for 'n' rows
>> + *     gopher_generate_bottom()
> 
> docuemnt this in the caller, not in every function

I originally forked textplain.c ...
but yeah.

> 
>> +                    /*"<!-- base href=\"%s\" -->\n"*//* XXX: needs the 
>> content url */
>> +                    /* Don't do that:
>> +                     * seems to trigger a reparsing of the gopher data 
>> itself as html...
>> +                     * "<meta http-equiv=\"Content-Type\" 
>> content=\"text/html; charset=UTF-8\" />\n"
>> +                     */
> 
> yuk

What ?
Failed attempts.

>> +                    /* TODO: move this to clean CSS in internal.css */
> 
> well why have you not?

Actually I did, forgot to remove the TODO.

>> +static bool gopher_generate_row_internal(char type, char *fields[5],
> 
> whats this magic number? not an enum so fileds[] has meaning?

Hmm ok...

>> +    /* XXX: outputting \n generates better looking html code,
>> +     * but currently screws up indentation due to a bug.
> 
> 
> what bug? why isnt it fixed?

Don't think so.
It's visible on:
http://revolf.free.fr/beos/shots/shot_netsurf_gopher_vs_fx2_001.png
lines like "(plus using..." and "Get your own..." are shifted on the right by a 
parasite whitespace.

I don't have much knowledge of the html renderer to fix that.

>> +     */
>> +#define HTML_LF 
>> +/*#define HTML_LF "\n"*/
>> +
> 
> yuk

until someone fixes it :p

>> +    case '0':       /* text/plain link */
>> +            error = snprintf(buffer, buffer_length,
>> +                            "<a href=\"gopher://%s%s%s/%c%s\";>"HTML_LF
>> +                            "<span class=\"text\">%s</span></a>"HTML_LF
>> +                            "<br/>"HTML_LF,
>> +                            fields[2],
>> +                            alt_port ? ":" : "",
>> +                            alt_port ? fields[3] : "",
>> +                            type, fields[1], nice_text);
> 
> way too much magic in one printf.

I can do much worse :P

>> +            error = snprintf(buffer, buffer_length,
>> +                            "<form method=\"get\" 
>> action=\"gopher://%s%s%s/%c%s\";>"HTML_LF
>> +                            "<span class=\"query\">"
>> +                            "<label>%s "
>> +                            "<input name=\"\" type=\"text\" align=\"right\" 
>> />"
>> +                            "</label>"
>> +                            "</span></form>"HTML_LF
>> +                            "<br/>"HTML_LF,
>> +                            fields[2],
>> +                            alt_port ? ":" : "",
>> +                            alt_port ? fields[3] : "",
>> +                            type, fields[1], nice_text);
> 
> 
> seriously? what? someome else might thave to read this stuff, please, less 
> and more readable 

When I said I could do much worse :D

>> +    case 'g':
>> +    case 'I':
>> +    case 'p':
>> +            /* quite dangerous, cf. gopher://namcub.accela-labs.com/1/pics 
>> */
> 
> 
> dangerous? explain!

Some research queries return gopher directories with a lot of images, which 
tend to stale the GUI while fetching them if we enable inlining them (with 
<img>).
Some clients like the OverbiteFF Firefox plugin inlines them inside an iframe 
that is only shown (and fetched) when clicked, but we can't do this yet.

An alternative would be to only inline the first N images...

>> +    if (error < 0 || error >= buffer_length)
>> +            /* Error or buffer too small */
>> +            return false;
>> +    else
>> +            /* OK */
>> +            return true;
>> +}
> 
> overall this function was too long, too complex, too many "magic" value

It kinda grew organically...

>> +                    if (type == '.' && field == 0 && p == s) {
>> +                            ;/* XXX: signal end of page? For now we just 
>> ignore it. */
>> +                    }
> 
> is that supposed to be a todo?

No it's an undecided part, it seems we don't really need it but part of the 
review was to answer this.

>> +#ifndef _NETSURF_RENDER_GOPHER_H_
>> +#define _NETSURF_RENDER_GOPHER_H_
> 
> it has recently been pointe dout to us that macros beginning _ belong
> to the compiler so we ought not to use them plain

Rather, to the OS vender IIRC, but yeah.

> NETSURF_RENDER_GOPHER_H would be ok

netsurf/render/textplain.h should be fixed then...

>> Index: render/html.c
>> ===================================================================
>> --- render/html.c    (revision 12321)
>> +++ render/html.c    (working copy)
>> @@ -101,7 +101,7 @@
>>              unsigned int depth);
>> #endif
>> 
>> -static const content_handler html_content_handler = {
>> +const content_handler html_content_handler = {
> 
> very rude word

"... not being able to create a content object without llcache object ..."

> is this useful in the generic url handling?

You tell me, that's why the review is for.
I don't think it has any use outside gopher, but it was more related to url 
handling, so...

>>      { "http_proxy_auth_pass", OPTION_STRING, &option_http_proxy_auth_pass },
>> +    { "gopher_inline_images", OPTION_BOOL,  &option_gopher_inline_images },
>>      { "font_size",          OPTION_INTEGER, &option_font_size },
>>      { "font_min_size",      OPTION_INTEGER, &option_font_min_size },
>>      { "font_sans",          OPTION_STRING,  &option_font_sans },
> 
> would this not have been better as an option to extend the generic
> directory listing to show images inline?

Maybe, but as I said, doing this for gopher is dangerous at least in the 
current state.

> various eww in teh curl handler...it really is supposed to be only for http(s)

Then why is it named *curl.c* ? ;-)

> 
>> 
>> +void * fetch_curl_setup_gopher(struct fetch *parent_fetch, const char *url,
>> +             bool only_2xx, const char *post_urlenc,
>> +             const struct fetch_multipart_data *post_multipart,
>> +             const char **headers)
>> +{
>> +    struct curl_fetch_info *f;
>> +    const char *mime;
>> +    char type;
>> +    f = fetch_curl_setup(parent_fetch, url, only_2xx, post_urlenc,
>> +             post_multipart, headers);
>> +    if (url_gopher_type(url, &type) == URL_FUNC_OK && type) {
>> +            f->gopher_type = type;
>> +    } else {
>> +            f->http_code = 404;
>> +            fetch_set_http_code(f->fetch_handle, f->http_code);
>> +    }
>> +
>> +    mime = gopher_type_to_mime(type);
>> +    /* TODO: add a fetch_mimetype_by_ext() or fetch_mimetype_sniff_data() */
> 
> err we already have a way of doing this for file: perhaps you can use that

Right, by ext, though I'm not sure every platform code actually likes being 
called with non-existing files, it isn't obvious from the documentation.
For example, the BeOS code tries to open the file to read its MIME type 
attribute.
I added some hardcoded types for known extensions, before noticing the 
BMimeType::GuessMimeType() methods that take a filename and data buffer.
Maybe we should have one for real files and one that only checks extensions ?
OTH we could just assume the file might not exist and fallback to extension 
matching for those.

>> @@ -771,6 +820,8 @@
>>      LOG(("done %s", f->url));
>> 
>>      if (abort_fetch == false && result == CURLE_OK) {
>> +            //if (f->gopher_type)
>> +                    //fetch_curl_gopher_data(NULL, 0, 0, f);
>>              /* fetch completed normally */
>>              if (f->stopped ||
>>                              (!f->had_headers &&
> 
> this is C not C++, apropriate comment syntax please and if its not
> useful *remove* it

Dead code leftover.

>> +                            /* TODO: try to guess better from the string ?
>> +                             * like "3 '/bcd' doesn't exist!"
>> +                             * TODO: what about other file types ?
>> +                             */
>> +                            f->http_code = 404;
> 
> hmm do we not have an enum for there? bad project, no biccuit

I like numbers :p

François.

Re: Review: Merge branch mmu_man/netsurf-gopher-support-v2

Reply via email to