-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Charles wrote:
> On Mon, Mar 17, 2008 at 4:41 PM, Micah Cowan <[EMAIL PROTECTED]> wrote:
>>  Is that true? I thought wget actually read the input file in a streaming
>>  fashion.
> 
> If that is the case, then I think it's possible to add links to the
> list while wget has already running.
> 
>>  I don't expect that a single session's database would get frequent
>>  reuse, though. However, it probably _would_ be used repeatedly while
>>  you're working on a specific session; in that case, it's useful to have
>>  the binary format.
> 
> A session database! :D So I have misunderstood this database thing. I
> thougt it is something like a central repository in the user's home
> (like .wget-history) that records all the links that have been
> downloaded with all its meta-information. Maybe a better name is a
> project file, or a session file, but calling it a database would have
> been too much ... :D. For a session information, an ini file is
> sufficient IMO.

Well, but it _is_ a database. And ini file makes me think of
initialization, which isn't quite what this is (AIUI, .wgetrc on Windows
is wget.ini or somesuch).

By "Session" Database, I meant that it's a database of information on a
given session, and not a database of sessions. Finding a way to clarify
that could be a good idea, though nothing comes to mind.

Perhaps "session info file" is better, as it seems (to me) to convey a
stronger understanding that it's information about a single session
(though it's still not unambiguous); and avoids the word "database",
which a couple people at least have already confused with RDBMSs, I think.

>>  However, it's important to be able to parse the file, even if there is
>>  some corruption or malformed information in some places--and especially,
>>  if it is truncated (Wget abruptly killed).
> 
> YAML is safe for this I think. The libyaml implements a YAML scanner.
> If the scanner failed at a point in the session file, we can consider
> all points forwards as invalid.
>
> And since YAML is composed of
> line-per-line information, the worst we will get is missing a line of
> information, instead of losing all the information in the file.

I'm confused by what you're saying here. "All points forward" to me
means... "all points forward". I think you're actually saying "up to the
newline character", since that's the only way we could miss only one line.

> To prevent losing data, wget has to frequently write to the session
> information, but frequent writing will burden the harddisk.

I think you and I have different ideas of what "frequent writing" is. It
certainly won't be nearly as frequent as writing to the logfile already
is. It'd obviously want to be line-buffered, and flushing for every line
is probably a good idea (this usually does not actually result in an
immediate disk write anyway, thanks to "shortcuts" taken by nearly all
disks).

> I wonder
> if memory mapped file can help with this. From
> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2044.html, it
> says that memory mapped file has feature of "Automatic file data
> synchronization and cache from the OS". If wget process is suddenly
> killed, the task of synchronizing memory and disk content will be done
> by the OS, CMIIW, so we won't lose any data.

I really don't see how that would save us anything over buffered I/O; in
particular, flushing at line boundaries (which the C standard library
will do automatically for us, if we ask it to) will prevent wget from
writing out incomplete lines, which to me is a good thing.

Existing applications write line-buffered data out to log files at a
_much_ higher rate than wget will be doing.

>>  Still, I imagine the problem is easily fixed by placing some line at the
>>  end of the file to indicate completion.
> 
> Wget completion timestamp would fit it.
> 
> Considering libyaml stability. Even though it's alpha quality software
> at version 0.0.1 it has already distributed with its stable
> counterpart pyyaml (which is implemented in Python [1]) so I think it
> is usable. At the time this session database feature of wget gets
> impelemented, libyaml could have reach its production release, so both
> can run together will, I guess.

Is it being actively developed? 0.0.1 seems to have been released mid-2006.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFH3qpT7M8hyUobTrERAp6BAKCQznexvKEVHCKxzvk9LBkqvXJSQgCfX2Pt
lV8wgjXK+qoPvsoO62sXips=
=Mz8X
-----END PGP SIGNATURE-----

Reply via email to