Re: Session Info Database API concepts [regarding wget and gsoc]

Micah Cowan Wed, 26 Mar 2008 09:57:34 -0700

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Siddhant Goel wrote:
> 
>     Thanks, Siddhant; I think this version is much improved over the
>     previous attempt.
> 
> 
> Thank *you*! I have revised it again and its waiting for you! :)
> 
> 
> 
>     I'd probably try not to get into the details of what the session db file
>     might look like; that's probably one of the things we'd need to define
>     as part of our initial discussions. I'm guessing that the list of
>     entries in your abstract are probably intended as only examples of the
>     _kind_ of data that would go in there, rather than the particulars of
>     how that data should look, but if I were you I'd say that explicitly.
>     (Also, that sort of information is probably not so appropriate in the
>     "Abstract", which is usually intended to be a very high-level
>     description of what the feature is).
> 
> I have now mentioned, that the list I am providing gives a fair enough
> *idea* of what needs to be implemented.
> Also, I agree that that list should be there in the How/Deliverables
> section ,rather than being in the Abstract. Corrected.


Actually, it looks like it's still in the Abstract. The GSoC application
will have two sections for your proposal text: Abstract, and Details.
Only the Details section is used to define what you're actually
responsible for, and is what your performance will be compared against.
So anything that's part of your actual proposal should go there.

My preference is actually that you'd avoid too much detail on these
entries and what they might look like, and focus on describing the exact
operations that Wget will be able to do with the information from a
session info db.

Similarly, the names of the files you'll write, and the names of
functions you'll put in them, and really any mention of low-level
implementation details, are not really appropriate for the proposal:
_how_ you write it isn't particularly relevant IMO, and it may be unwise
to constrain yourself in regards to those things; _what_ you write, and
a high-level view of how it works, are what's relevant, to me.

> For this issue, I have made two sections in an entry. A download_status
> value (which should be 1 if the download was successful, and 0 if it was
> not), and a status_reason value (which, as I see it, will contain
> specific codes for specific failure reasons. 1 could be for a 404, 2
> could be for a 403, 3 could be for a name resolution error, and so on).

I'm not sure I like that, but that's a detail we could discuss later, if
your proposal is accepted.

>     It'd be really nice to touch on a little bit of _how_ Wget will handle
>     looking up a local pathname from a chain of redirects (i.e., if you're
>     not going to do it by using grep for each download, how _will_ you
>     accomplish it?).
> 
> Yes. It would surely be nice. I have added a simple procedure in the
> How/Deliverables section, to depict what's going on in my mind when I am
> thinking of a solution to this feature.

I'm actually looking for something a bit different. Things like
opening/reading the file, and using sizeof where appropriate, are pretty
much implied in any solution that uses I/O (and actually, I don't think
sizeof will end up playing much of a role).

It's a little hard to tell, but the procedure you describe sounds like a
linear traversal of the whole session db for each lookup operation.
That's not really optimal (and is the algorithmic equivalent to just
using grep for each lookup); I'd ideally be looking for a solution that
takes a little time to construct some appropriate structures after the
file is opened and read, but which then allows the actual lookups to be
very quick: at least proportional to the logarithm of the number of
entries in the db; constant-time lookups (or, more accurately, lookups
proportional to the length of the string being looked-up) are also
possible/welcome.

>     It would also be good to talk a little about compatibility issues, such
>     as how Wget could handle session dbs that were generated from newer
>     versions of Wget, that might specify information that Wget doesn't know
>     how to process.
> 
> I didn't quite get that. My proposal says about a slight code addition
> to wget in its future releases, so older versions wouldn't have a
> session info feature. Please correct me if I'm wrong.

No, you're right. But what about newer versions of Wget, which may add
new information and features to the session db file format? Ideally, the
"older" version of Wget (the one that uses your first version of the
session db file) should understand the information it needs to
understand, ignore the information it doesn't need to and doesn't know
how to understand, and possibly fail if there's information it doesn't
know how to understand, but needs to (there should also be a way to
indicate structural changes to the session db format, such that the file
can indicate that Wget should be at least version "X" to read the file.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer,
and GNU Wget Project Maintainer.
http://micah.cowan.name/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFH6oBj7M8hyUobTrERAhLbAKCLnaE7kphDcZxG9s/i0HWSEhN1dwCgjF2H
1jNHW5fXJ7J7HmLrKyVaZpY=
=gy9u
-----END PGP SIGNATURE-----

Re: Session Info Database API concepts [regarding wget and gsoc]

Reply via email to