stayondomain/host, scheduler, memory leak
heya, I've just started using Plucker Desktop (Version 1.2.0.0, Build Date - Sep 14 2002), and searched through this list but have not been able to find any mention of these problems... a) I've used the builtin HTML editor to build my Home page. This is one of my lines: a href=http://mobile.theonion.com; maxdepth=2 stayondomainThe Onion/a The switches stayondomain and stayonhost doesn't seem to work. It still plucks pages beyond what I set it to do. When my logs were set to extensive, this error shows up: Processing http://mobile.theonion.com/... Retrieved ok. Ignoring invalid link attribute 'stayondomain' Ignoring invalid link attribute 'stayondomain' Ignoring invalid link attribute 'stayondomain' b) Scheduler - I've set a channel to update at 5am. Due to the fact that there's *quite* a lot of stuff to download (about 1000++ items), it takes quite a bit of time. This results in Plucker Desktop loading up another instance of the plucking software to pluck the exact same channel (It says Channel is due). Is there something that stops Plucker Desktop from loading another instance of the same channel? c) This was a result of b). As it loads up a few instances of Plucker (I tend to leave my computer when it's plucking), the memory usage starts ballooning. However, when I close off the extra instances, the memory doesn't seem to return. That's it! This is a fantastic software! Thanks, and if I can get the Salon pages on Plucker, I'm all set to dump Avantgo! -- - R love me or hate me, just spare me your indifference... - Unknown ___ plucker-list mailing list [EMAIL PROTECTED] http://lists.rubberchicken.org/mailman/listinfo/plucker-list
Re: stayondomain/host, scheduler, memory leak
At 11:59 PM 9/26/2002 +0800, Richard Kang wrote: heya, I've just started using Plucker Desktop (Version 1.2.0.0, Build Date - Sep 14 2002), and searched through this list but have not been able to find any mention of these problems... a) I've used the builtin HTML editor to build my Home page. This is one of my lines: a href=http://mobile.theonion.com; maxdepth=2 stayondomainThe Onion/a The switches stayondomain and stayonhost doesn't seem to work. It still plucks pages beyond what I set it to do. When my logs were set to extensive, this error shows up: Processing http://mobile.theonion.com/... Retrieved ok. Ignoring invalid link attribute 'stayondomain' Ignoring invalid link attribute 'stayondomain' Ignoring invalid link attribute 'stayondomain' I don't see stayondomain in the docs, the helpfile, or the source. stayonhost is in all of them and should work. b) Scheduler - I've set a channel to update at 5am. Due to the fact that there's *quite* a lot of stuff to download (about 1000++ items), it takes quite a bit of time. This results in Plucker Desktop loading up another instance of the plucking software to pluck the exact same channel (It says Channel is due). Is there something that stops Plucker Desktop from loading another instance of the same channel? Does this belong in the Desktop, or should it be in the Spider? It seems more appropriate for the parser rather than the high-level interface. I could probably get that into the Desktop using a Windows named mutex, but the parser (Spider.py) is written in Python. I haven't determined how to create a cross-process named mutex or semaphor in Python yet. Tony McNamara ___ plucker-list mailing list [EMAIL PROTECTED] http://lists.rubberchicken.org/mailman/listinfo/plucker-list
Re: stayondomain/host, scheduler, memory leak
The switches stayondomain and stayonhost doesn't seem to work. It still plucks pages beyond what I set it to do. When my logs were set to extensive, this error shows up: Processing http://mobile.theonion.com/... Retrieved ok. Ignoring invalid link attribute 'stayondomain' Ignoring invalid link attribute 'stayondomain' Ignoring invalid link attribute 'stayondomain' I don't see stayondomain in the docs, the helpfile, or the source. stayonhost is in all of them and should work. Tony is bang on. 'stayondomain' doesn't exist yet. It has been in the parser wishlist for about a year, but no one skilled enough has had a chance to implement it yet. Any takers? I will ship a fresh bottle of Bailey's to anyone who can help out ;-) b) Scheduler - I've set a channel to update at 5am. Due to the fact that there's *quite* a lot of stuff to download (about 1000++ items), it takes quite a bit of time. This results in Plucker Desktop loading up another instance of the plucking software to pluck the exact same channel (It says Channel is due). Is there something that stops Plucker Desktop from loading another instance of the same channel? If you are using the progress dialog then a new process wont't pop up during the process (though this was fixed only recently). And if you are using either progress dialog or console progress from the commandline plucker- desktop, it should be impossible for this to happen, since there is no timer running, and hence no possibility of starting a new update batch. It will occur however (currently), if you are using console progress windows and plucker- desktop in non-commandline mode. This is because the console windows are asynchronous, there is no way for Plucker Desktop to know when the final channel in the batch is terminated (and hence to know when safe to turn on/off the faucet to allow a new update to initiate). In the progess dialog, we do track the termination of each update because they are piped processes there, and we thus get an event notification when the processes terminates. I am still thinking of the best way for the console windows. The best that I can come up so far is to have a message dialog that says 'click here when you are done', that will reallow a new autoupdate to start when you click it. memory usage The distiller uses a good deal of memory since there is a lot of links and information to manage. Stopping a second instance from initiating, as you mention, is the best solution. Best wishes, Robert.---~~---. / \ /\MedicalMnemonics.com |__| .---+` `+---. A free non-profit online searchable | | () | | database of medical mnemonics to help `---+.__.+---' remember the important details. (| OO |) ^\/^ http://www.medicalmnemonics.com \\// [EMAIL PROTECTED] \/ `-_ _-' ~~ ___ plucker-list mailing list [EMAIL PROTECTED] http://lists.rubberchicken.org/mailman/listinfo/plucker-list
Re: stayondomain/host, scheduler, memory leak
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Tony is bang on. 'stayondomain' doesn't exist yet. It has been in the parser wishlist for about a year, but no one skilled enough has had a chance to implement it yet. I've been asking for it for almost 2 years, in fact, at the same exact time that the stayonhost parameter was created, I asked for stayondomain, which has a very different scope. Any takers? I will ship a fresh bottle of Bailey's to anyone who can help out ;-) How far will you ship that =) (j/k). Seriously though, I've put out a few ideas about how to implement it at a parser level, reversing the string and reversing back. I'm not sure if Python has a robust domain validation library, but it's not that hard to splice out the relevant bits and only be left with the domain itself. I'll try to hack up a standalone something this weekend (in perl of course) to show as a proof of concept of how this can be done. Whomever wants to roll that back into something the Python distiller can grok, go ahead, I'll split the Baileys with you =) d. perldoc -qa.j | perl -lpe '($_)=m((.*))' -BEGIN PGP SIGNATURE- Version: GnuPG v1.1.92 (GNU/Linux) iD8DBQE9k31nkRQERnB1rkoRAqtMAJ0eApIn1sUKV4a/+n4Ayz7/+T0zCwCgwW1A FRVWo+/+oFrGWsx+Rojfl3o= =RxX6 -END PGP SIGNATURE- ___ plucker-list mailing list [EMAIL PROTECTED] http://lists.rubberchicken.org/mailman/listinfo/plucker-list
Re: stayondomain/host, scheduler, memory leak
On Thu, 26 Sep 2002, David A. Desrosiers wrote: Tony is bang on. 'stayondomain' doesn't exist yet. It has been in the parser wishlist for about a year, but no one skilled enough has had a chance to implement it yet. I've been asking for it for almost 2 years, in fact, at the same exact time that the stayonhost parameter was created, I asked for stayondomain, which has a very different scope. Well, I think I have seen home_stayondomain somewhere... Isn't it the option you are looking for? MB -- Martin Bodlak, Ostrava, Czech Republic http://bodlak.hyperlink.cz, [EMAIL PROTECTED] --- Navstivte take http://www.palmknihy.cz/ ___ plucker-list mailing list [EMAIL PROTECTED] http://lists.rubberchicken.org/mailman/listinfo/plucker-list
Re: stayondomain/host, scheduler, memory leak
On Thu, Sep 26, 2002, Robert O'Connor wrote: It has been in the parser wishlist for about a year, but no one skilled enough has had a chance to implement it yet. Any takers? I will ship a fresh bottle of Bailey's to anyone who can help out ;-) Then ship it to Australia ;-) Alice Harris contributed a patch one year ago, but it was never included in the parser... /Mike ___ plucker-list mailing list [EMAIL PROTECTED] http://lists.rubberchicken.org/mailman/listinfo/plucker-list
Re: stayondomain/host, scheduler, memory leak
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Here is the URL with Alice's kind patch: http://www.mail-archive.com/plucker-dev@rubberchicken.org/msg01626.html And don't forget that the patch in that message is exactly the opposite in diff format that you'll need to make it work. He diffed them the wrong way (i.e. all lines with '-' should be '+' and vice versa). d. perldoc -qa.j | perl -lpe '($_)=m((.*))' -BEGIN PGP SIGNATURE- Version: GnuPG v1.1.92 (GNU/Linux) iD8DBQE9k46lkRQERnB1rkoRAhzKAJ41UFMIBXRkI98kEMD7pAJpXeG19QCfcJpV d/25w5mmNqvGfp/0N1Kg8po= =grc1 -END PGP SIGNATURE- ___ plucker-list mailing list [EMAIL PROTECTED] http://lists.rubberchicken.org/mailman/listinfo/plucker-list
Re: stayondomain/host, scheduler, memory leak
At 06:48 PM 9/26/2002 -0400, David A. Desrosiers wrote: Here is the URL with Alice's kind patch: http://www.mail-archive.com/plucker-dev@rubberchicken.org/msg01626.html And don't forget that the patch in that message is exactly the opposite in diff format that you'll need to make it work. He diffed them the wrong way (i.e. all lines with '-' should be '+' and vice versa). When the request came through, I dove in to test my new Python skills and have a slightly different approach mostly-implemented against a new get. Should I bother finishing it up? To whom would I submit it upon completion? Tony McNamara ___ plucker-list mailing list [EMAIL PROTECTED] http://lists.rubberchicken.org/mailman/listinfo/plucker-list
Stayondomain (Was Re: stayondomain/host, scheduler, memory leak)
On 26 Sep 2002 at 15:58, Fringe Ryder wrote: At 06:48 PM 9/26/2002 -0400, David A. Desrosiers wrote: Here is the URL with Alice's kind patch: http://www.mail-archive.com/plucker-dev@rubberchicken.org/msg01626.html And don't forget that the patch in that message is exactly the opposite in diff format that you'll need to make it work. He diffed them the wrong way (i.e. all lines with '-' should be '+' and vice versa). When the request came through, I dove in to test my new Python skills and have a slightly different approach mostly-implemented against a new get. Should I bother finishing it up? To whom would I submit it upon completion? Bill Janssen is maintaining the parser in CVS, and is the chap to talk to. If you (or someone else), by any chance, would like to at last create another opt-requested feature, it would be support for including/excluding alt text of images when they aren't a hyperlink: #include alt_text alt_text=1 #exclude alt_text alt_text=0 This is so that you don't have to sift though a minefield of [img] all over a Plucker document, which happens often when crawling a site not especially designed for handhelds. Implementation would be something along the lines of support of the option from commandline and config, like other options. When it comes time to write the [img] or other alt text, it would be wrapped appropriately in conditions checking for whether it is/isn't a hyperlink, and whether the config switch is on/off. Best wishes, Robert ___ plucker-list mailing list [EMAIL PROTECTED] http://lists.rubberchicken.org/mailman/listinfo/plucker-list