Re: waiting for new files in a directory

2000-12-28 Thread Peter Pentchev

On Thu, Dec 28, 2000 at 12:23:12PM +1300, Dan Langille wrote:
 On 27 Dec 2000, at 19:56, Peter Pentchev wrote:
 
  On Wed, Dec 27, 2000 at 09:16:34AM -0800, Alfred Perlstein wrote:
   * Dan Langille [EMAIL PROTECTED] [001226 23:50] wrote:

My idea is to have a daemon, or something resembling one, sitting on 
the box watching the directory.  When a new file appears, it starts a perl 
script.  This perl script is beyound the scope of my question, but it  
processes all the files in the directory.  When finished, it looks for any 
more files and repeats as necessary.  If no more files, it exits.

   
   This isn't an answer to your main question (i see it's already been
   discussed), but you may be able to use setup a kevent on the
   directory which should inform you if any files are added to it.
  
  Unfortunately, I gather that Dan intends to write most of the FreshPorts
  code in Perl, and AFAIK, Perl has no kqueue/kevent interface :(
 
 Unfortunately?  *grin*  FWIW, Most of the existing and new code will be 
 PHP based.  Perl is used primarly for importing data from cvs-all.  And 
 for various mailings out to users.

The 'unfortunately' part was not to say that I don't like Perl, or that
I don't think it should be written in Perl; rather, that at the moment,
Perl has no easy way of using the kqueue/kevent interface.  If there were
such an iface for Perl, it would all be done with one little filter
invoked from procmail to write the message, and one sleepy Perl thing,
idling in an kevent() call most of the time, and only waking up when
there are changes to the dir.

Hmm.  On second thoughts, I wonder if the sleep/opendir method might
not work better under temporarily high load - even better than the
cron-based one.  If a bunch of mails arrive at the same time.. hmm
I should play around with kevent to see how it could handle this -
notifying me for each and every message could be suboptimal.
The sleep/opendir way would process as many new messages as there
have arrived; ditto for the cron-based one, *except* that if there
are too many messages, there could be two or three Perl interpreter
invocations, which find an old script still running, and die quietly,
having used up some CPU resources in the meantime.

  Thus, to make use of kevent (which I certainly agree would be a better
  FreeBSD-specific solution), he'd have to either 1. have a C program
  which spawns Perl and his script on every change, or 2. have a C program
  which spawns Perl once and signals it on every change.
  
  The first way would be downright stupid IMHO..  The second one may
  very well be more efficient than the readdir, sleep solution which
  I proposed in other postings, seeing that Dan wants to process
  the cvs-all mailings, which certainly do not arrive every few seconds :)
 
 I like the 2nd concept.  It appeals to me.  I haven't done any C in about 
 7 years and all of that was in Windows.  Never in a Unix environment.  
 This solution is more complex than the "cron job every minute" which I 
 discussed with Mark, but it fits with my goal of having processed the 
 cvs-all messages as quickly as I can.

I could play around with kevent in a couple of days to see how it
behaves when multiple messages arrive.  When a file or multiple files
arrive, the sleeper would have to go through the opendir/readdir
dance, and either only process the first file it finds, or process them
all.  In the second case, if multiple files should arrive, those would
be all processed in response to one event, and the next events would
trigger lots of opendir/readdir/closedir calls with no files found.

Hmm.. as a side note..  I'm not quite sure how kqueues operate on
vnodes.  If I should request an EVFILT_VNODE filter with NOTE_WRITE,
receive an event, find a new file, then unlink() it (which involves
writing to the vnode I'm monitoring), will *my* write generate
another event I'd have to process?

G'luck,
Peter

-- 
You have, of course, just begun reading the sentence that you have just finished 
reading.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: waiting for new files in a directory

2000-12-28 Thread Dan Langille

On 28 Dec 2000, at 10:50, Peter Pentchev wrote:

 Hmm.  On second thoughts, I wonder if the sleep/opendir method might
 not work better under temporarily high load - even better than the
 cron-based one.  If a bunch of mails arrive at the same time.. hmm
 I should play around with kevent to see how it could handle this -
 notifying me for each and every message could be suboptimal.

I would appreciate that very much.

 I could play around with kevent in a couple of days to see how it
 behaves when multiple messages arrive.  When a file or multiple files
 arrive, the sleeper would have to go through the opendir/readdir
 dance, and either only process the first file it finds, or process them
 all.  In the second case, if multiple files should arrive, those would
 be all processed in response to one event, and the next events would
 trigger lots of opendir/readdir/closedir calls with no files found.

I'll include my thoughts in case they help:

What about a daemon signalling a waiting perl script?  The script would 
wake up, take the first file, process it, repeat until no more files, then go 
back to sleep.

Is it an issue if the daemon signals the perl script when it's already 
processing?  Could a signal be missed?

thank you.

--
Dan Langille
The FreeBSD Diary - http://freebsddiary.org/
   FreshPorts - http://freshports.org/
 NZ Broadband - http://unixathome.org/broadband/


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: waiting for new files in a directory

2000-12-28 Thread Volker Stolz

Am 28. Dec 2000 um 10:33 MET schrieb Dan Langille:
 What about a daemon signalling a waiting perl script?
 Is it an issue if the daemon signals the perl script when it's already 
 processing?  Could a signal be missed?

How about using a FIFO (maybe in /tmp) and let the daemon printf,echo,cat,...
control-msgs into the FIFO and have a perl script sitting on the other end?
Signals suck. Another advantage would be that the perl script could choose
it´s own pace and let things queue up in the FIFO. However, a FIFO only
has limited capacity. If I´d be using Haskell (http://www.haskell.org), I´d
throw in a forkIO() and would get a neatly multi-threaded solution where one
thread reads the FIFO and queues up requests while the other thread queries
him for more work -- I don´t know about threaded perl, though.
-- 
\usepackage[latin1]{inputenc}!
Volker Stolz * [EMAIL PROTECTED] * PGP + S/MIME


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: waiting for new files in a directory

2000-12-28 Thread Peter Wemm

Volker Stolz wrote:
 Am 28. Dec 2000 um 10:33 MET schrieb Dan Langille:
  What about a daemon signalling a waiting perl script?
  Is it an issue if the daemon signals the perl script when it's already 
  processing?  Could a signal be missed?
 
 How about using a FIFO (maybe in /tmp) and let the daemon printf,echo,cat,...
 control-msgs into the FIFO and have a perl script sitting on the other end?
 Signals suck. Another advantage would be that the perl script could choose
 it´s own pace and let things queue up in the FIFO. However, a FIFO only
 has limited capacity. If I´d be using Haskell (http://www.haskell.org), I´d
 throw in a forkIO() and would get a neatly multi-threaded solution where one
 thread reads the FIFO and queues up requests while the other thread queries
 him for more work -- I don´t know about threaded perl, though.

This sort of thing is why we added poll(2) and later kqueue(2) support
for getting notifications on directory changes..  eg: you can get an event
to tell you that a new file "appeared" in your directory.

Cheers,
-Peter
--
Peter Wemm - [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]
"All of this is for nothing if we don't go to the stars" - JMS/B5



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: waiting for new files in a directory

2000-12-28 Thread Dan Langille

On 28 Dec 2000, at 11:29, Volker Stolz wrote:

 Am 28. Dec 2000 um 10:33 MET schrieb Dan Langille:
  What about a daemon signalling a waiting perl script?
  Is it an issue if the daemon signals the perl script when it's already
  processing?  Could a signal be missed?

 How about using a FIFO (maybe in /tmp) and let the daemon printf,echo,cat,...
 control-msgs into the FIFO and have a perl script sitting on the other end?

That sounds good to me.  It meets the criteria.

 Signals suck. Another advantage would be that the perl script could choose
 it´s own pace and let things queue up in the FIFO. However, a FIFO only
 has limited capacity.

Given that we are processing incoming messages from cvs-all, I don't
think we'll meet that capacity (not that I know what the capacity is).

  If I´d be using Haskell (http://www.haskell.org), I´d
 throw in a forkIO() and would get a neatly multi-threaded solution where one
 thread reads the FIFO and queues up requests while the other thread queries
 him for more work -- I don´t know about threaded perl, though.

That sounds great.  But without knowing more, I think it's too much for
the task at hand.  I would like to keep things simple and free from
complicity.  Writing a multi-threaded solution, unless someone else
wants to do it, may be too big of a task for me.  Volunteeers?  ;)

thank you.

--
Dan Langille
The FreeBSD Diary - http://freebsddiary.org/
   FreshPorts - http://freshports.org/
 NZ Broadband - http://unixathome.org/broadband/


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: waiting for new files in a directory

2000-12-28 Thread Peter Pentchev

On Thu, Dec 28, 2000 at 11:36:50PM +1300, Dan Langille wrote:
 On 28 Dec 2000, at 11:29, Volker Stolz wrote:
 
  Am 28. Dec 2000 um 10:33 MET schrieb Dan Langille:
   What about a daemon signalling a waiting perl script?
   Is it an issue if the daemon signals the perl script when it's already 
   processing?  Could a signal be missed?
  
  How about using a FIFO (maybe in /tmp) and let the daemon printf,echo,cat,...
  control-msgs into the FIFO and have a perl script sitting on the other end?
 
 That sounds good to me.  It meets the criteria.

Actually, there's no need for the FIFO.  What I've been thinking about
is a little C program that spawns a Perl script, then sits, watching
the spool directory through the kevent interface.  When a new file
appears, the parent lets the child know - this need not be signal-based,
I'm thinking more along the lines of writing to a previously-opened pipe.

This has the added benefit that the parent can monitor the child's status,
and respawn it if it dies; with separate processes and a FIFO, if the reader
dies, the writer either blocks or goes haywire, judging from my (admittedly
limited) experience.  Handling SIGCHLD and respawning seems easy :)  Also,
the Perl child can find out the parent has died (the pipe shall close or
something), and die gracefully, to be reborn as the parent is respawned.

Respawning the parent could be done as either a /etc/inittab-respawned
process, or a service running under svscan from DJB's daemontools package.
The latter case also has a almost-built-in logging with support for log
rotation through multilog.

G'luck,
Peter

-- 
If you think this sentence is confusing, then change one pig.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: waiting for new files in a directory

2000-12-28 Thread Kris Kennaway

On Thu, Dec 28, 2000 at 02:35:19AM -0800, Peter Wemm wrote:

 This sort of thing is why we added poll(2) and later kqueue(2) support
 for getting notifications on directory changes..  eg: you can get an event
 to tell you that a new file "appeared" in your directory.

See how the l0pht-watch port does exactly this. In fact you could
probably use that program as-is - I think it has the capability to
execute another process on file creation..

Kris

 PGP signature


Re: waiting for new files in a directory

2000-12-28 Thread Dag-Erling Smorgrav

What are you guys smoking? Use cron to run a spool scanning job every
minute or so, and use a lock file to make sure one doesn't start until
the previous one is done. Note that reliable locking is non-trivial in
Perl; a quick workaround is to use a lock directory instead (mkdir()
will fail if the directory exists; make sure to differentiate between
"somebody already holds the lock" and "the lock can't be created due
to permission errors or some other problem" by examining $!)

DES
-- 
Dag-Erling Smorgrav - [EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: waiting for new files in a directory

2000-12-28 Thread Volker Stolz

On Thu, Dec 28, 2000 at 01:35:08PM +0100, Dag-Erling Smorgrav wrote:
 What are you guys smoking?

*shrug* Can you spell "event-driven"? There are ways to do things much
more elegantly today (see all the references to kevent()).
-- 
\usepackage[latin1]{inputenc}!
Volker Stolz * [EMAIL PROTECTED] * PGP + S/MIME


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: waiting for new files in a directory

2000-12-28 Thread Peter Pentchev

On Thu, Dec 28, 2000 at 01:35:08PM +0100, Dag-Erling Smorgrav wrote:
 What are you guys smoking? Use cron to run a spool scanning job every
 minute or so, and use a lock file to make sure one doesn't start until
 the previous one is done. Note that reliable locking is non-trivial in
 Perl; a quick workaround is to use a lock directory instead (mkdir()
 will fail if the directory exists; make sure to differentiate between
 "somebody already holds the lock" and "the lock can't be created due
 to permission errors or some other problem" by examining $!)

I've tried this; and I still believe that a process continuously
watching the directory is better than a cronjob for several reasons,
which I have outlined in a previous mail.  First, there is *no* need
for locking if a single process is there all the time; this eliminates
all sorts of locking problems.  Second, there is no overhead in starting
Perl (yeah, yeah, so it's cached after the first few times, but still..)
each and every minute just to find nothing and die quietly - as somebody
else said, that's exactly why poll(2) and later kqueue/kevent work on
directory vnodes.  Third, if a process uses poll(2) or kqueue, it shall
react on new mails the moment they arrive, not up to a minute later.

G'luck,
Peter

-- 
because I didn't think of a good beginning of it.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: waiting for new files in a directory

2000-12-28 Thread Dag-Erling Smorgrav

Volker Stolz [EMAIL PROTECTED] writes:
 On Thu, Dec 28, 2000 at 01:35:08PM +0100, Dag-Erling Smorgrav wrote:
  What are you guys smoking?
 *shrug* Can you spell "event-driven"? There are ways to do things much
 more elegantly today (see all the references to kevent()).

I choose simple and working over elegant.

DES
-- 
Dag-Erling Smorgrav - [EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: waiting for new files in a directory

2000-12-28 Thread Peter Pentchev

On Thu, Dec 28, 2000 at 01:44:34PM +0100, Dag-Erling Smorgrav wrote:
 Volker Stolz [EMAIL PROTECTED] writes:
  On Thu, Dec 28, 2000 at 01:35:08PM +0100, Dag-Erling Smorgrav wrote:
   What are you guys smoking?
  *shrug* Can you spell "event-driven"? There are ways to do things much
  more elegantly today (see all the references to kevent()).
 
 I choose simple and working over elegant.

I think opendir-readdir-closedir-sleep is a bit simpler than the locking
you yourself admit is non-trivial :)

G'luck,
Peter

-- 
When you are not looking at it, this sentence is in Spanish.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: waiting for new files in a directory

2000-12-28 Thread Dag-Erling Smorgrav

Peter Pentchev [EMAIL PROTECTED] writes:
 On Thu, Dec 28, 2000 at 01:44:34PM +0100, Dag-Erling Smorgrav wrote:
  Volker Stolz [EMAIL PROTECTED] writes:
   On Thu, Dec 28, 2000 at 01:35:08PM +0100, Dag-Erling Smorgrav wrote:
What are you guys smoking?
   *shrug* Can you spell "event-driven"? There are ways to do things much
   more elegantly today (see all the references to kevent()).
  I choose simple and working over elegant.
 I think opendir-readdir-closedir-sleep is a bit simpler than the locking
 you yourself admit is non-trivial :)

Locking in Perl is a known problem with a known solution which takes
me five or ten minutes to implement off the top of my head, and I
don't trust insert your favorite startup script here not to start
multiple copies of the spool scanner.

You can of course write the scanner in such a way that multiple
instances can run in paralell without harm even without locking; this
is left as an exercise for the reader.

DES
-- 
Dag-Erling Smorgrav - [EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: waiting for new files in a directory

2000-12-27 Thread Mark Murray

 Any ideas on how to do this?  Any suggestions on the process?

Simple lock (like flock(3)) in the perl script. Lock some ${FILE},
and if you can't get the lock, die. The file should contain the PID
of the process that holds the lock, so that a cleanerd can kill
stuck processes, or so that the lock can be blown away if needed.

Works like a charm.

M
--
Mark Murray
Warning: this .sig is umop ap!sdn


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: waiting for new files in a directory

2000-12-27 Thread Dan Langille

On 27 Dec 2000, at 10:11, Mark Murray wrote:

  Any ideas on how to do this?  Any suggestions on the process?
 
 Simple lock (like flock(3)) in the perl script. Lock some ${FILE},
 and if you can't get the lock, die. The file should contain the PID
 of the process that holds the lock, so that a cleanerd can kill
 stuck processes, or so that the lock can be blown away if needed.
 
 Works like a charm.

Thanks Mark.  But what part of the solution does flock solve?

I'm not sure if my lack of comprehension stems from my initial 
description being inadequete or my knowledge being too narrow.

--
Dan Langille
The FreeBSD Diary - http://freebsddiary.org/
   FreshPorts - http://freshports.org/
 NZ Broadband - http://unixathome.org/broadband/


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: waiting for new files in a directory

2000-12-27 Thread Peter Pentchev

On Wed, Dec 27, 2000 at 08:49:51PM +1300, Dan Langille wrote:
 FreshPorts2 will have a new processing strategy for incoming 
 messages.  Each message will be in a separate file in a predetermined 
 directory. As each file arrives, it is processed by a perl script.  I want 
 only one instance of that perl script running at a given time.  This is 
 primarily for serialization and to ensure the system doesn't get bogged 
 down running perl scripts if many messages arrive in a short period of 
 time.
 
 My idea is to have a daemon, or something resembling one, sitting on 
 the box watching the directory.  When a new file appears, it starts a perl 
 script.  This perl script is beyound the scope of my question, but it  
 processes all the files in the directory.  When finished, it looks for any 
 more files and repeats as necessary.  If no more files, it exits.
 
 If a file arrives, the daemon checks to see if the perl script is already 
 running.  If so, it doesn't start another one.
 
 Any ideas on how to do this?  Any suggestions on the process?

I would do that (and have done it in several projects) using opendir()
and readdir().  Open the directory, read entry by entry, when you find
a file you want, process it and unlink() it.  Get to the end of the dir,
sleep, repeat.

Beware of a subtle problem here though - see that you do not have
the process which creates files creating them in that directory; you
might very well wind up with a file being processed before it's fully
created.  There are two solutions to this problem - either DJB's
Maildir style, or processing files based on filenames.

DJB's Maildir concept is based on having two directories - a temporary
one where files are created and then atomically move/rename'd to
the real one.  This works best when the tempdir and the real dir are
located on the same filesystem, and you can use the rename() syscall.
However, there is a solution if you want the temporary dir on another
filesystem - there is a safecat program, which I shall shortly commit
a port for (it's been sitting in my to-do tree for several weeks now).

The other way is create the files in the same directory, but with
a different name style, e.g. ending in .tmp; then when you readdir()
an entry, only process those not ending in .tmp, or only process those
ending in .xml, or something like that.  This might be a bit easier
to implement.

G'luck,
Peter

-- 
If there were no counterfactuals, this sentence would not have been paradoxical.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: waiting for new files in a directory

2000-12-27 Thread Dan Langille

On 27 Dec 2000, at 12:11, Peter Pentchev wrote:

 I would do that (and have done it in several projects) using opendir()
 and readdir().  Open the directory, read entry by entry, when you find
 a file you want, process it and unlink() it.  Get to the end of the dir,
 sleep, repeat.

Thanks for that.

Do you have code I can use as a starting position?

 DJB's Maildir concept is based on having two directories - a temporary
 one where files are created and then atomically move/rename'd to
 the real one.  This works best when the tempdir and the real dir are
 located on the same filesystem, and you can use the rename() syscall.

At present the files are created through procmail like this:

|/usr/bin/perl $HOME/process_cvs_mail.pl  ~/msgs/$FILE

I guess I could add a rename.

cheers

--
Dan Langille
The FreeBSD Diary - http://freebsddiary.org/
   FreshPorts - http://freshports.org/
 NZ Broadband - http://unixathome.org/broadband/


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: waiting for new files in a directory

2000-12-27 Thread Dima Dorfman

 On 27 Dec 2000, at 10:11, Mark Murray wrote:
  
  [use flock(2)]
 
 But what part of the solution does flock solve?

It solves the problem of finding out whether the Perl script is
already running, but as I understood the original posting, this isn't
what you were asking.  See below.

 I'm not sure if my lack of comprehension stems from my initial 
 description being inadequete or my knowledge being too narrow.

Probably from it being a little confusing.  Here's how I understand
it.  You have some program putting files in directory /x.  You need
something that will be notified when a new file appears in /x.  That
something then starts a Perl script to process the files.

If you control the program that's putting files into /x, the easiest
way would be to have it send a signal to your daemon.  You can put its
PID in a well-known file for it to look at.  If, however, you don't
control the program, you may have to resort to looking at the
directory every now and then and checking for new files (``polling'').
Depending on your application, this may or may not be acceptable.

If you don't want to use polling, you might try fooling around with
the select(2), poll(2), or kqueue(2) interfaces.  The former two were
designed to be used with regular files or sockets, but in unix, a
directory is just a special type of file.  I don't know how they'd
react to it.  In particular, the EVFILT_VNODE filter with the
NOTE_EXTEND event/flag (notifies you when the file descriptor
specified was extended) looks promising.

Then again, I'm not a filesystem whiz, so this may all be nonsense.
Hopefully I've at least interpreted your question correctly.

Regards

Dima Dorfman
[EMAIL PROTECTED]



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: waiting for new files in a directory

2000-12-27 Thread Peter Pentchev

On Wed, Dec 27, 2000 at 11:17:47PM +1300, Dan Langille wrote:
 On 27 Dec 2000, at 12:11, Peter Pentchev wrote:
 
  I would do that (and have done it in several projects) using opendir()
  and readdir().  Open the directory, read entry by entry, when you find
  a file you want, process it and unlink() it.  Get to the end of the dir,
  sleep, repeat.
 
 Thanks for that.
 
 Do you have code I can use as a starting position?

Try the attached file, it works for me.

Btw, anybody reading this discussion - I tried the attached script with
#!/usr/bin/perl -wT, and Perl died on the unlink() - "unsafe dependency".
What gives?

 
  DJB's Maildir concept is based on having two directories - a temporary
  one where files are created and then atomically move/rename'd to
  the real one.  This works best when the tempdir and the real dir are
  located on the same filesystem, and you can use the rename() syscall.
 
 At present the files are created through procmail like this:
 
 |/usr/bin/perl $HOME/process_cvs_mail.pl  ~/msgs/$FILE
 
 I guess I could add a rename.

Something like..
| /usr/bin/perl $HOME/process.pl  ~/msgs/$FILE.tmp  \
  mv ~/msgs/$FILE.tmp ~/msgs/$FILE.cvs

..or alternatively, use safecat (which I shall commit real-soon-now), and..
| /usr/bin/perl process.pl | /usr/local/bin/safecat ~/msgs/tmpdir/ ~/msgs/

safecat takes two arguments - a temp dir and the real dir - reads stdin,
and stores it there.

G'luck,
Peter

-- 
What would this sentence be like if pi were 3?

#!/usr/bin/perl -w
# $Id: procdir.pl,v 1.1 2000/12/27 10:48:30 roam Exp $

use strict;

sub OnePass {
my $dir = (shift || "");
my ($fname, @files);

die("OnePass() requires a dir argument\n") if ($dir eq "");
opendir(D, $dir) or die("Opening $dir: $!\n");
@files = readdir(D);
closedir(D);
foreach $fname (@files) {
next if (($fname eq ".") || ($fname eq ".."));
# more filename vailidity checks go here
next unless $fname =~ /\.cvs$/;

# ok, we want this file
print "Processing $dir/$fname\n";

# done with it..
unlink("$dir/$fname") or warn("Removing $dir/$fname: $!\n");
# this is evil - if we could process it, but could not
# remove it, we might end up processing it again at the next
# iteration :(
}
}

sub ProcessDir {
my $dir = (shift || "");

die("ProcessDir() requires a dir argument\n") if ($dir eq "");
for (;;) {
OnePass($dir);
# this could be done with select(), with a signal handler,
# many different ways..  polling and sleep() is easy
sleep(2);
}
}

MAIN:{
# obtain directory name in some way
my $d = "/tmp";
ProcessDir($d);
# er heh.. this should never return :)
die("ProcessDir() returned?.. $!\n");
}


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: waiting for new files in a directory

2000-12-27 Thread Dan Langille

On 27 Dec 2000, at 10:11, Mark Murray wrote:

  Any ideas on how to do this?  Any suggestions on the process?
 
 Simple lock (like flock(3)) in the perl script. Lock some ${FILE},
 and if you can't get the lock, die. The file should contain the PID
 of the process that holds the lock, so that a cleanerd can kill
 stuck processes, or so that the lock can be blown away if needed.
 
 Works like a charm.

Mark and I have been msging offline and he's agreed to my posting the 
results of our discussion:

Thanks Mark.  But what part of the solution does flock solve?
   
   It prevents more than one perl script from running. You can then 
   cron perl scripts to deal with the incoming, and not worry about
   them jumping on each other. 
  
  Yes.  That does make some things much easier.  That's a very
  simple solution.
  
  I was looking for a gold-plated solution where messages are 
  processed right away.  But it sounds too complicated.  I guess 
  setting up a cron job to run every minute is fine.  
  
  The perl script looks like this:
  
  flock a file, if it fails, die.
 
 Write PID to flocked file.
 
  Loop
Get oldest file in directory (file are named Y.m.d.h.m.s.PID)
process it
move file to archives
  until no more files
 
 Truncate file
 
  unlock the file
  
  The cleaner you mentioned: run it every 15 minutes, compare the 
  date/time on the lockfile, if more than 15 minutes old, grab the PID,  
  and kill the job, remove the lock.
 
 Correct.

Thanks Mark.

--
Dan Langille
The FreeBSD Diary - http://freebsddiary.org/
   FreshPorts - http://freshports.org/
 NZ Broadband - http://unixathome.org/broadband/


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: waiting for new files in a directory

2000-12-27 Thread Mike Bristow

On Wed, Dec 27, 2000 at 12:53:37PM +0200, Peter Pentchev wrote:
 Btw, anybody reading this discussion - I tried the attached script with
 #!/usr/bin/perl -wT, and Perl died on the unlink() - "unsafe dependency".
 What gives?

$ man perldiag
[snip]
   Insecure dependency in %s
   (F) You tried to do something that the tainting
   mechanism didn't like.  The tainting mechanism is
   turned on when you're running setuid or setgid, or
   when you specify -T to turn it on explicitly.  The
   tainting mechanism labels all data that's derived
   directly or indirectly from the user, who is
   considered to be unworthy of your trust.  If any such
   data is used in a "dangerous" operation, you get this
   error.  See the perlsec manpage for more information.
[snip]

Note that a filename you get from readdir is (indirectly) from the
user, and unlink counts as dangerous.

Basically, you need to "untaint" $fname in OnePass before using it in
the unlink call; this is fairly trivial to do, and if you can't work it 
out from perlsec(1), feel free to contact me off-list.

-- 
Mike Bristow, seebitwopie  


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: waiting for new files in a directory

2000-12-27 Thread Peter Pentchev

On Wed, Dec 27, 2000 at 11:09:40AM +, Mike Bristow wrote:
 On Wed, Dec 27, 2000 at 12:53:37PM +0200, Peter Pentchev wrote:
  Btw, anybody reading this discussion - I tried the attached script with
  #!/usr/bin/perl -wT, and Perl died on the unlink() - "unsafe dependency".
  What gives?
 
 $ man perldiag
 [snip]
Insecure dependency in %s
(F) You tried to do something that the tainting
mechanism didn't like.  The tainting mechanism is
turned on when you're running setuid or setgid, or
when you specify -T to turn it on explicitly.  The
tainting mechanism labels all data that's derived
directly or indirectly from the user, who is
considered to be unworthy of your trust.  If any such
data is used in a "dangerous" operation, you get this
error.  See the perlsec manpage for more information.
 [snip]
 
 Note that a filename you get from readdir is (indirectly) from the
 user, and unlink counts as dangerous.
 
 Basically, you need to "untaint" $fname in OnePass before using it in
 the unlink call; this is fairly trivial to do, and if you can't work it 
 out from perlsec(1), feel free to contact me off-list.

Whoops.  Yup, thanks.  Updated version attached.

G'luck,
Peter

-- 
Nostalgia ain't what it used to be.

#!/usr/bin/perl -wT
# $Id: procdir.pl,v 1.2 2000/12/27 11:16:38 roam Exp $

use strict;

sub OnePass {
my $dir = (shift || "");
my ($fname, @files);

die("OnePass() requires a dir argument\n") if ($dir eq "");
opendir(D, $dir) or die("Opening $dir: $!\n");
@files = readdir(D);
closedir(D);
foreach $fname (@files) {
next if (($fname eq ".") || ($fname eq ".."));
# more filename vailidity checks go here
# pattern filtering and subexpression to 'untaint'
next unless $fname =~ /^([\w\d._-]+\.cvs)$/;
$fname = $1;

# ok, we want this file
print "Processing $dir/$fname\n";

# done with it..
unlink("$dir/$fname") or warn("Removing $dir/$fname: $!\n");
# this is evil - if we could process it, but could not
# remove it, we might end up processing it again at the next
# iteration :(
}
}

sub ProcessDir {
my $dir = (shift || "");

die("ProcessDir() requires a dir argument\n") if ($dir eq "");
for (;;) {
OnePass($dir);
# this could be done with select(), with a signal handler,
# many different ways..  polling and sleep() is easy
sleep(2);
}
}

MAIN:{
# obtain directory name in some way
my $d = "/tmp";
ProcessDir($d);
# er heh.. this should never return :)
die("ProcessDir() returned?.. $!\n");
}


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: waiting for new files in a directory

2000-12-27 Thread Peter Pentchev

On Wed, Dec 27, 2000 at 01:18:28PM +0200, Peter Pentchev wrote:
[snip..]
   closedir(D);
   foreach $fname (@files) {
   next if (($fname eq ".") || ($fname eq ".."));
   # more filename vailidity checks go here
^ validity.. *sigh* :P
   # pattern filtering and subexpression to 'untaint'
   next unless $fname =~ /^([\w\d._-]+\.cvs)$/;
   $fname = $1;

G'luck,
Peter

-- 
Thit sentence is not self-referential because "thit" is not a word.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: waiting for new files in a directory

2000-12-27 Thread Mark Murray

   unlock the file
   
   The cleaner you mentioned: run it every 15 minutes, compare the 
   date/time on the lockfile, if more than 15 minutes old, grab the PID,  
   and kill the job, remove the lock.
  
  Correct.

Actually, you can make it a lot better:

If the lockfile exists, then kill -0 the PID to see if it is still live.
If not, blow away the lockfile. If still alive and older than N minutes,
blow away the PID and break the lock.

M
--
Mark Murray
Warning: this .sig is umop ap!sdn


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



RE: waiting for new files in a directory

2000-12-27 Thread Koster, K.J.

Dear All,

What you'd really want is some kind of message queueing system for this kind
of work. What message queueing systems are (non-commercially) available on
UNIX systems?

Kees Jan


 You are only young once,
   but you can stay immature all your life.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: waiting for new files in a directory

2000-12-27 Thread Alfred Perlstein

* Dan Langille [EMAIL PROTECTED] [001226 23:50] wrote:
 
 My idea is to have a daemon, or something resembling one, sitting on 
 the box watching the directory.  When a new file appears, it starts a perl 
 script.  This perl script is beyound the scope of my question, but it  
 processes all the files in the directory.  When finished, it looks for any 
 more files and repeats as necessary.  If no more files, it exits.
 

This isn't an answer to your main question (i see it's already been
discussed), but you may be able to use setup a kevent on the
directory which should inform you if any files are added to it.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: waiting for new files in a directory

2000-12-27 Thread Peter Pentchev

On Wed, Dec 27, 2000 at 09:16:34AM -0800, Alfred Perlstein wrote:
 * Dan Langille [EMAIL PROTECTED] [001226 23:50] wrote:
  
  My idea is to have a daemon, or something resembling one, sitting on 
  the box watching the directory.  When a new file appears, it starts a perl 
  script.  This perl script is beyound the scope of my question, but it  
  processes all the files in the directory.  When finished, it looks for any 
  more files and repeats as necessary.  If no more files, it exits.
  
 
 This isn't an answer to your main question (i see it's already been
 discussed), but you may be able to use setup a kevent on the
 directory which should inform you if any files are added to it.

Unfortunately, I gather that Dan intends to write most of the FreshPorts
code in Perl, and AFAIK, Perl has no kqueue/kevent interface :(
Thus, to make use of kevent (which I certainly agree would be a better
FreeBSD-specific solution), he'd have to either 1. have a C program
which spawns Perl and his script on every change, or 2. have a C program
which spawns Perl once and signals it on every change.

The first way would be downright stupid IMHO..  The second one may
very well be more efficient than the readdir, sleep solution which
I proposed in other postings, seeing that Dan wants to process
the cvs-all mailings, which certainly do not arrive every few seconds :)

As a side-point - does Perl really have a kqueue/kevent interface?
If not, how hard would it be to write a litte Perl module to implement
that?  (Unfortunately, I am a complete stranger to Perl modules..)
A Perl script which uses kevent to wait on a directory would certainly
be more efficient than any of the above solutions :)

G'luck,
Peter

-- 
I am jealous of the first word in this sentence.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: waiting for new files in a directory

2000-12-27 Thread Jack Rusher


  I was about to write up a group of suggestions that include the notion
that you could use kqueue to watch the directory's vnode, you could use
Erez's stackable file system code to pass all file creates through a
filter, use lpd's spooling mechanism to treat the incoming directory
like a print queue, use a standard issue cron job, etc, etc.  But...

 At present the files are created through procmail like this:
 
 |/usr/bin/perl $HOME/process_cvs_mail.pl  ~/msgs/$FILE

...this fragment tells me that you are in control of the process of
creating these files.  This makes the whole problem much easier to solve
and side steps the issue of watching the directory altogether.

  In addition to the suggestions above, you could also:

  You could set up the message processing daemon to listen on a named pipe
and send the messages there from the process_cvs_mail script.

  You could handle queue entry with the process_cvs_mail script and
queue exit with your daemon; signal the daemon from the script when 
new work appears in the queue.  This would mirror a threaded work queue
approach that blocks on a a conditional variable until work comes into the
queue.

--
Jack Rusher, Senior Engineer | mailto:[EMAIL PROTECTED]
Integratus, Inc. | http://www.integratus.com





To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: waiting for new files in a directory

2000-12-27 Thread Dan Langille

On 27 Dec 2000, at 12:53, Peter Pentchev wrote:

 Something like..
 | /usr/bin/perl $HOME/process.pl  ~/msgs/$FILE.tmp  \
   mv ~/msgs/$FILE.tmp ~/msgs/$FILE.cvs

Thanks for that.  It's helped me solve a procmail problem I was having.  
The files were 600 instead of 640, so I did this:

|/usr/bin/perl $HOME/process_cvs_mail.pl  ~/msgs/$FILE  chmod 
o+r ~/msgs/$FILE

Works great.  Cheers.

--
Dan Langille
The FreeBSD Diary - http://freebsddiary.org/
   FreshPorts - http://freshports.org/
 NZ Broadband - http://unixathome.org/broadband/


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: waiting for new files in a directory

2000-12-27 Thread Dan Langille

On 27 Dec 2000, at 19:56, Peter Pentchev wrote:

 On Wed, Dec 27, 2000 at 09:16:34AM -0800, Alfred Perlstein wrote:
  * Dan Langille [EMAIL PROTECTED] [001226 23:50] wrote:
   
   My idea is to have a daemon, or something resembling one, sitting on 
   the box watching the directory.  When a new file appears, it starts a perl 
   script.  This perl script is beyound the scope of my question, but it  
   processes all the files in the directory.  When finished, it looks for any 
   more files and repeats as necessary.  If no more files, it exits.
   
  
  This isn't an answer to your main question (i see it's already been
  discussed), but you may be able to use setup a kevent on the
  directory which should inform you if any files are added to it.
 
 Unfortunately, I gather that Dan intends to write most of the FreshPorts
 code in Perl, and AFAIK, Perl has no kqueue/kevent interface :(

Unfortunately?  *grin*  FWIW, Most of the existing and new code will be 
PHP based.  Perl is used primarly for importing data from cvs-all.  And 
for various mailings out to users.

 Thus, to make use of kevent (which I certainly agree would be a better
 FreeBSD-specific solution), he'd have to either 1. have a C program
 which spawns Perl and his script on every change, or 2. have a C program
 which spawns Perl once and signals it on every change.
 
 The first way would be downright stupid IMHO..  The second one may
 very well be more efficient than the readdir, sleep solution which
 I proposed in other postings, seeing that Dan wants to process
 the cvs-all mailings, which certainly do not arrive every few seconds :)

I like the 2nd concept.  It appeals to me.  I haven't done any C in about 
7 years and all of that was in Windows.  Never in a Unix environment.  
This solution is more complex than the "cron job every minute" which I 
discussed with Mark, but it fits with my goal of having processed the 
cvs-all messages as quickly as I can.

--
Dan Langille
The FreeBSD Diary - http://freebsddiary.org/
   FreshPorts - http://freshports.org/
 NZ Broadband - http://unixathome.org/broadband/


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: waiting for new files in a directory

2000-12-27 Thread Dan Langille

On 27 Dec 2000, at 11:25, Jack Rusher wrote:

  At present the files are created through procmail like this:
  
  |/usr/bin/perl $HOME/process_cvs_mail.pl  ~/msgs/$FILE
 
 ...this fragment tells me that you are in control of the process of
 creating these files.  

That is correct.

 This makes the whole problem much easier to solve
 and side steps the issue of watching the directory altogether.
 
   In addition to the suggestions above, you could also:
 
   You could set up the message processing daemon to listen on a named pipe
 and send the messages there from the process_cvs_mail script.
 
   You could handle queue entry with the process_cvs_mail script and
 queue exit with your daemon; signal the daemon from the script when 
 new work appears in the queue.  This would mirror a threaded work queue
 approach that blocks on a a conditional variable until work comes into the
 queue.

Will this approach tie up the procmail script?  I want the MTA to be 
freed up ASAP.  That's one of the primary reason for wanting separate 
processes. From time to time, the website can be "flooded" with 
messages.  This is usually the result of the website being offline or 
otherwise disconnected from the net.  The mail builds up and then 
arrives all at once.  That's the reason for freeing up the MTA quickly.

--
Dan Langille
The FreeBSD Diary - http://freebsddiary.org/
   FreshPorts - http://freshports.org/
 NZ Broadband - http://unixathome.org/broadband/


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



waiting for new files in a directory

2000-12-26 Thread Dan Langille

FreshPorts2 will have a new processing strategy for incoming 
messages.  Each message will be in a separate file in a predetermined 
directory. As each file arrives, it is processed by a perl script.  I want 
only one instance of that perl script running at a given time.  This is 
primarily for serialization and to ensure the system doesn't get bogged 
down running perl scripts if many messages arrive in a short period of 
time.

My idea is to have a daemon, or something resembling one, sitting on 
the box watching the directory.  When a new file appears, it starts a perl 
script.  This perl script is beyound the scope of my question, but it  
processes all the files in the directory.  When finished, it looks for any 
more files and repeats as necessary.  If no more files, it exits.

If a file arrives, the daemon checks to see if the perl script is already 
running.  If so, it doesn't start another one.

Any ideas on how to do this?  Any suggestions on the process?

--
Dan Langille
The FreeBSD Diary - http://freebsddiary.org/
   FreshPorts - http://freshports.org/
 NZ Broadband - http://unixathome.org/broadband/


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message