-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Siddhant Goel wrote: > > Thanks, Siddhant; I think this version is much improved over the > previous attempt. > > > Thank *you*! I have revised it again and its waiting for you! :) > > > > I'd probably try not to get into the details of what the session db file > might look like; that's probably one of the things we'd need to define > as part of our initial discussions. I'm guessing that the list of > entries in your abstract are probably intended as only examples of the > _kind_ of data that would go in there, rather than the particulars of > how that data should look, but if I were you I'd say that explicitly. > (Also, that sort of information is probably not so appropriate in the > "Abstract", which is usually intended to be a very high-level > description of what the feature is). > > I have now mentioned, that the list I am providing gives a fair enough > *idea* of what needs to be implemented. > Also, I agree that that list should be there in the How/Deliverables > section ,rather than being in the Abstract. Corrected.
Actually, it looks like it's still in the Abstract. The GSoC application will have two sections for your proposal text: Abstract, and Details. Only the Details section is used to define what you're actually responsible for, and is what your performance will be compared against. So anything that's part of your actual proposal should go there. My preference is actually that you'd avoid too much detail on these entries and what they might look like, and focus on describing the exact operations that Wget will be able to do with the information from a session info db. Similarly, the names of the files you'll write, and the names of functions you'll put in them, and really any mention of low-level implementation details, are not really appropriate for the proposal: _how_ you write it isn't particularly relevant IMO, and it may be unwise to constrain yourself in regards to those things; _what_ you write, and a high-level view of how it works, are what's relevant, to me. > For this issue, I have made two sections in an entry. A download_status > value (which should be 1 if the download was successful, and 0 if it was > not), and a status_reason value (which, as I see it, will contain > specific codes for specific failure reasons. 1 could be for a 404, 2 > could be for a 403, 3 could be for a name resolution error, and so on). I'm not sure I like that, but that's a detail we could discuss later, if your proposal is accepted. > It'd be really nice to touch on a little bit of _how_ Wget will handle > looking up a local pathname from a chain of redirects (i.e., if you're > not going to do it by using grep for each download, how _will_ you > accomplish it?). > > Yes. It would surely be nice. I have added a simple procedure in the > How/Deliverables section, to depict what's going on in my mind when I am > thinking of a solution to this feature. I'm actually looking for something a bit different. Things like opening/reading the file, and using sizeof where appropriate, are pretty much implied in any solution that uses I/O (and actually, I don't think sizeof will end up playing much of a role). It's a little hard to tell, but the procedure you describe sounds like a linear traversal of the whole session db for each lookup operation. That's not really optimal (and is the algorithmic equivalent to just using grep for each lookup); I'd ideally be looking for a solution that takes a little time to construct some appropriate structures after the file is opened and read, but which then allows the actual lookups to be very quick: at least proportional to the logarithm of the number of entries in the db; constant-time lookups (or, more accurately, lookups proportional to the length of the string being looked-up) are also possible/welcome. > It would also be good to talk a little about compatibility issues, such > as how Wget could handle session dbs that were generated from newer > versions of Wget, that might specify information that Wget doesn't know > how to process. > > I didn't quite get that. My proposal says about a slight code addition > to wget in its future releases, so older versions wouldn't have a > session info feature. Please correct me if I'm wrong. No, you're right. But what about newer versions of Wget, which may add new information and features to the session db file format? Ideally, the "older" version of Wget (the one that uses your first version of the session db file) should understand the information it needs to understand, ignore the information it doesn't need to and doesn't know how to understand, and possibly fail if there's information it doesn't know how to understand, but needs to (there should also be a way to indicate structural changes to the session db format, such that the file can indicate that Wget should be at least version "X" to read the file. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer, and GNU Wget Project Maintainer. http://micah.cowan.name/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFH6oBj7M8hyUobTrERAhLbAKCLnaE7kphDcZxG9s/i0HWSEhN1dwCgjF2H 1jNHW5fXJ7J7HmLrKyVaZpY= =gy9u -----END PGP SIGNATURE-----