stayondomain/host, scheduler, memory leak

2002-09-26 Thread Richard Kang

heya,

I've just started using Plucker Desktop (Version 1.2.0.0, Build 
Date - Sep 14 2002), and searched through this list but have not been 
able to find any mention of these problems...

a) I've used the builtin HTML editor to build my Home page. This 
is one of my lines:

a href=http://mobile.theonion.com; maxdepth=2 
stayondomainThe Onion/a

The switches stayondomain and stayonhost doesn't seem to 
work. It still plucks pages beyond what I set it to do. When my logs 
were set to extensive, this error shows up:

Processing http://mobile.theonion.com/...
  Retrieved ok.
Ignoring invalid link attribute 'stayondomain'
Ignoring invalid link attribute 'stayondomain'
Ignoring invalid link attribute 'stayondomain'

b) Scheduler - I've set a channel to update at 5am. Due to the 
fact that there's *quite* a lot of stuff to download (about 1000++ 
items), it takes quite a bit of time. This results in Plucker Desktop 
loading up another instance of the plucking software to pluck the exact 
same channel (It says Channel is due). Is there something that stops 
Plucker Desktop from loading another instance of the same channel?

c) This was a result of b). As it loads up a few instances of 
Plucker (I tend to leave my computer when it's plucking), the memory 
usage starts ballooning. However, when I close off the extra instances, 
the memory doesn't seem to return.

That's it! This is a fantastic software! Thanks, and if I can 
get the Salon pages on Plucker, I'm all set to dump Avantgo!

-- 
  - R


love me or hate me,
  just spare me your indifference...
   - Unknown



___
plucker-list mailing list
[EMAIL PROTECTED]
http://lists.rubberchicken.org/mailman/listinfo/plucker-list



Re: stayondomain/host, scheduler, memory leak

2002-09-26 Thread Fringe Ryder

At 11:59 PM 9/26/2002  +0800, Richard Kang wrote:
heya,

I've just started using Plucker Desktop (Version 1.2.0.0, Build 
 Date - Sep 14 2002), and searched through this list but have not been 
 able to find any mention of these problems...

a) I've used the builtin HTML editor to build my Home page. This 
 is one of my lines:

a href=http://mobile.theonion.com; maxdepth=2 stayondomainThe 
 Onion/a

The switches stayondomain and stayonhost doesn't seem to work. 
 It still plucks pages beyond what I set it to do. When my logs were set 
 to extensive, this error shows up:

Processing http://mobile.theonion.com/...
  Retrieved ok.
Ignoring invalid link attribute 'stayondomain'
Ignoring invalid link attribute 'stayondomain'
Ignoring invalid link attribute 'stayondomain'

I don't see stayondomain in the docs, the helpfile, or the 
source.  stayonhost is in all of them and should work.

   b) Scheduler - I've set a channel to update at 5am. Due to the fact 
 that there's *quite* a lot of stuff to download (about 1000++ items), it 
 takes quite a bit of time. This results in Plucker Desktop loading up 
 another instance of the plucking software to pluck the exact same channel 
 (It says Channel is due). Is there something that stops Plucker Desktop 
 from loading another instance of the same channel?

Does this belong in the Desktop, or should it be in the Spider?   It seems 
more appropriate for the parser rather than the high-level interface.  I 
could probably get that into the Desktop using a Windows named mutex, but 
the parser (Spider.py) is written in Python.  I haven't determined how to 
create a cross-process named mutex or semaphor in Python yet.

 Tony McNamara

___
plucker-list mailing list
[EMAIL PROTECTED]
http://lists.rubberchicken.org/mailman/listinfo/plucker-list



Re: stayondomain/host, scheduler, memory leak

2002-09-26 Thread Robert O'Connor

 The switches stayondomain and stayonhost doesn't seem to work. 
  It still plucks pages beyond what I set it to do. When my logs were set 
  to extensive, this error shows up:
 
 Processing http://mobile.theonion.com/...
   Retrieved ok.
 Ignoring invalid link attribute 'stayondomain'
 Ignoring invalid link attribute 'stayondomain'
 Ignoring invalid link attribute 'stayondomain'
 
 I don't see stayondomain in the docs, the helpfile, or the 
 source.  stayonhost is in all of them and should work.

Tony is bang on. 'stayondomain' doesn't exist yet. It has been in the parser 
wishlist for about 
a year, but no one skilled enough has had a chance to implement it yet. Any 
takers? I will ship 
a fresh bottle of Bailey's to anyone who can help out ;-)

b) Scheduler - I've set a channel to update at 5am. Due to the fact 
  that there's *quite* a lot of stuff to download (about 1000++ items), it 
  takes quite a bit of time. This results in Plucker Desktop loading up 
  another instance of the plucking software to pluck the exact same channel 
  (It says Channel is due). Is there something that stops Plucker Desktop 
  from loading another instance of the same channel?

If you are using the progress dialog then a new process wont't pop up during 
the process 
(though this was fixed only recently). 

And if you are using either progress dialog or console progress from the 
commandline plucker-
desktop, it should be impossible for this to happen, since there is no timer 
running, and hence 
no possibility of starting a new update batch.

It will occur however (currently), if you are using console progress windows 
and plucker-
desktop in non-commandline mode. This is because the console windows are 
asynchronous, there is 
no way for Plucker Desktop to know when the final channel in the batch is 
terminated (and hence 
to know when safe to turn on/off the faucet to allow a new update to initiate). 
In the progess 
dialog, we do track the termination of each update because they are piped 
processes there, and 
we thus get an event notification when the processes terminates.
I am still thinking of the best way for the console windows. The best that I 
can come up so far 
is to have a message dialog that says 'click here when you are done', that will 
reallow a new 
autoupdate to start when you click it.
 
memory usage
The distiller uses a good deal of memory since there is a lot of links and 
information to 
manage. Stopping a second instance from initiating, as you mention, is the best 
solution.

Best wishes,
Robert.---~~---.   

   /  \
  /\MedicalMnemonics.com
 |__|
.---+`  `+---.  A free non-profit online searchable
|   | () |   |  database of medical mnemonics to 
help
`---+.__.+---'  remember the important details.
(|  OO  |)
 ^\/^   http://www.medicalmnemonics.com
   \\// [EMAIL PROTECTED]
\/   

 `-_  _-'
~~

___
plucker-list mailing list
[EMAIL PROTECTED]
http://lists.rubberchicken.org/mailman/listinfo/plucker-list



Re: stayondomain/host, scheduler, memory leak

2002-09-26 Thread David A. Desrosiers

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


 Tony is bang on. 'stayondomain' doesn't exist yet. It has been in the
 parser wishlist for about a year, but no one skilled enough has had a
 chance to implement it yet.

I've been asking for it for almost 2 years, in fact, at the same
exact time that the stayonhost parameter was created, I asked for
stayondomain, which has a very different scope.

 Any takers? I will ship a fresh bottle of Bailey's to anyone who can help
 out ;-)

How far will you ship that =) (j/k). Seriously though, I've put out
a few ideas about how to implement it at a parser level, reversing the
string and reversing back. I'm not sure if Python has a robust domain
validation library, but it's not that hard to splice out the relevant bits
and only be left with the domain itself.

I'll try to hack up a standalone something this weekend (in perl
of course) to show as a proof of concept of how this can be done. Whomever
wants to roll that back into something the Python distiller can grok, go
ahead, I'll split the Baileys with you =)


d.

perldoc -qa.j | perl -lpe '($_)=m((.*))'

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.1.92 (GNU/Linux)

iD8DBQE9k31nkRQERnB1rkoRAqtMAJ0eApIn1sUKV4a/+n4Ayz7/+T0zCwCgwW1A
FRVWo+/+oFrGWsx+Rojfl3o=
=RxX6
-END PGP SIGNATURE-

___
plucker-list mailing list
[EMAIL PROTECTED]
http://lists.rubberchicken.org/mailman/listinfo/plucker-list



Re: stayondomain/host, scheduler, memory leak

2002-09-26 Thread Martin Bodlk

On Thu, 26 Sep 2002, David A. Desrosiers wrote:

  Tony is bang on. 'stayondomain' doesn't exist yet. It has been in the
  parser wishlist for about a year, but no one skilled enough has had a
  chance to implement it yet.
 
   I've been asking for it for almost 2 years, in fact, at the same
 exact time that the stayonhost parameter was created, I asked for
 stayondomain, which has a very different scope.

Well, I think I have seen home_stayondomain somewhere... Isn't it the 
option you are looking for? 

MB


-- 
Martin Bodlak, Ostrava, Czech Republic
http://bodlak.hyperlink.cz, [EMAIL PROTECTED]
---
Navstivte take http://www.palmknihy.cz/


___
plucker-list mailing list
[EMAIL PROTECTED]
http://lists.rubberchicken.org/mailman/listinfo/plucker-list



Re: stayondomain/host, scheduler, memory leak

2002-09-26 Thread Michael Nordström

On Thu, Sep 26, 2002, Robert O'Connor wrote:
 It has been in the parser wishlist for about a year, but no one
 skilled enough has had a chance to implement it yet. Any takers?
 I will ship a fresh bottle of Bailey's to anyone who can help out ;-)

Then ship it to Australia ;-) Alice Harris contributed a patch one
year ago, but it was never included in the parser...

/Mike

___
plucker-list mailing list
[EMAIL PROTECTED]
http://lists.rubberchicken.org/mailman/listinfo/plucker-list



Re: stayondomain/host, scheduler, memory leak

2002-09-26 Thread David A. Desrosiers

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


 Here is the URL with Alice's kind patch:
 http://www.mail-archive.com/plucker-dev@rubberchicken.org/msg01626.html

And don't forget that the patch in that message is exactly the
opposite in diff format that you'll need to make it work. He diffed them the
wrong way (i.e. all lines with '-' should be '+' and vice versa).



d.

perldoc -qa.j | perl -lpe '($_)=m((.*))'

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.1.92 (GNU/Linux)

iD8DBQE9k46lkRQERnB1rkoRAhzKAJ41UFMIBXRkI98kEMD7pAJpXeG19QCfcJpV
d/25w5mmNqvGfp/0N1Kg8po=
=grc1
-END PGP SIGNATURE-

___
plucker-list mailing list
[EMAIL PROTECTED]
http://lists.rubberchicken.org/mailman/listinfo/plucker-list



Re: stayondomain/host, scheduler, memory leak

2002-09-26 Thread Fringe Ryder

At 06:48 PM 9/26/2002  -0400, David A. Desrosiers wrote:
  Here is the URL with Alice's kind patch:
  http://www.mail-archive.com/plucker-dev@rubberchicken.org/msg01626.html

 And don't forget that the patch in that message is exactly the
opposite in diff format that you'll need to make it work. He diffed them the
wrong way (i.e. all lines with '-' should be '+' and vice versa).

When the request came through, I dove in to test my new Python skills and 
have a slightly different approach mostly-implemented against a new 
get.  Should I bother finishing it up?  To whom would I submit it upon 
completion?

 Tony McNamara

___
plucker-list mailing list
[EMAIL PROTECTED]
http://lists.rubberchicken.org/mailman/listinfo/plucker-list



Stayondomain (Was Re: stayondomain/host, scheduler, memory leak)

2002-09-26 Thread Robert O'Connor

On 26 Sep 2002 at 15:58, Fringe Ryder wrote:

 At 06:48 PM 9/26/2002  -0400, David A. Desrosiers wrote:
   Here is the URL with Alice's kind patch:
   http://www.mail-archive.com/plucker-dev@rubberchicken.org/msg01626.html
 
  And don't forget that the patch in that message is exactly the
 opposite in diff format that you'll need to make it work. He diffed them the
 wrong way (i.e. all lines with '-' should be '+' and vice versa).
 
 When the request came through, I dove in to test my new Python skills and 
 have a slightly different approach mostly-implemented against a new 
 get.  Should I bother finishing it up?  To whom would I submit it upon 
 completion?

Bill Janssen is maintaining the parser in CVS, and is the chap to talk to.

If you (or someone else), by any chance, would like to at last create another 
opt-requested 
feature, it would be support for including/excluding alt text of images when they 
aren't a 
hyperlink:
#include alt_text
alt_text=1
#exclude alt_text
alt_text=0

This is so that you don't have to sift though a minefield of [img] all over a 
Plucker 
document, which happens often when crawling a site not especially designed for 
handhelds.
Implementation would be something along the lines of support of the option from 
commandline and 
config, like other options. When it comes time to write the [img] or other alt text, 
it would 
be wrapped appropriately in conditions checking for whether it is/isn't a hyperlink, 
and 
whether the config switch is on/off.

Best wishes,
Robert 
___
plucker-list mailing list
[EMAIL PROTECTED]
http://lists.rubberchicken.org/mailman/listinfo/plucker-list