-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Charles wrote: > On Mon, Mar 17, 2008 at 4:41 PM, Micah Cowan <[EMAIL PROTECTED]> wrote: >> Is that true? I thought wget actually read the input file in a streaming >> fashion. > > If that is the case, then I think it's possible to add links to the > list while wget has already running. > >> I don't expect that a single session's database would get frequent >> reuse, though. However, it probably _would_ be used repeatedly while >> you're working on a specific session; in that case, it's useful to have >> the binary format. > > A session database! :D So I have misunderstood this database thing. I > thougt it is something like a central repository in the user's home > (like .wget-history) that records all the links that have been > downloaded with all its meta-information. Maybe a better name is a > project file, or a session file, but calling it a database would have > been too much ... :D. For a session information, an ini file is > sufficient IMO.
Well, but it _is_ a database. And ini file makes me think of initialization, which isn't quite what this is (AIUI, .wgetrc on Windows is wget.ini or somesuch). By "Session" Database, I meant that it's a database of information on a given session, and not a database of sessions. Finding a way to clarify that could be a good idea, though nothing comes to mind. Perhaps "session info file" is better, as it seems (to me) to convey a stronger understanding that it's information about a single session (though it's still not unambiguous); and avoids the word "database", which a couple people at least have already confused with RDBMSs, I think. >> However, it's important to be able to parse the file, even if there is >> some corruption or malformed information in some places--and especially, >> if it is truncated (Wget abruptly killed). > > YAML is safe for this I think. The libyaml implements a YAML scanner. > If the scanner failed at a point in the session file, we can consider > all points forwards as invalid. > > And since YAML is composed of > line-per-line information, the worst we will get is missing a line of > information, instead of losing all the information in the file. I'm confused by what you're saying here. "All points forward" to me means... "all points forward". I think you're actually saying "up to the newline character", since that's the only way we could miss only one line. > To prevent losing data, wget has to frequently write to the session > information, but frequent writing will burden the harddisk. I think you and I have different ideas of what "frequent writing" is. It certainly won't be nearly as frequent as writing to the logfile already is. It'd obviously want to be line-buffered, and flushing for every line is probably a good idea (this usually does not actually result in an immediate disk write anyway, thanks to "shortcuts" taken by nearly all disks). > I wonder > if memory mapped file can help with this. From > http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2044.html, it > says that memory mapped file has feature of "Automatic file data > synchronization and cache from the OS". If wget process is suddenly > killed, the task of synchronizing memory and disk content will be done > by the OS, CMIIW, so we won't lose any data. I really don't see how that would save us anything over buffered I/O; in particular, flushing at line boundaries (which the C standard library will do automatically for us, if we ask it to) will prevent wget from writing out incomplete lines, which to me is a good thing. Existing applications write line-buffered data out to log files at a _much_ higher rate than wget will be doing. >> Still, I imagine the problem is easily fixed by placing some line at the >> end of the file to indicate completion. > > Wget completion timestamp would fit it. > > Considering libyaml stability. Even though it's alpha quality software > at version 0.0.1 it has already distributed with its stable > counterpart pyyaml (which is implemented in Python [1]) so I think it > is usable. At the time this session database feature of wget gets > impelemented, libyaml could have reach its production release, so both > can run together will, I guess. Is it being actively developed? 0.0.1 seems to have been released mid-2006. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFH3qpT7M8hyUobTrERAp6BAKCQznexvKEVHCKxzvk9LBkqvXJSQgCfX2Pt lV8wgjXK+qoPvsoO62sXips= =Mz8X -----END PGP SIGNATURE-----