RE: Module for simple processing of log files

2005-03-30 Thread Orton, Yves
Title: RE: Module for simple processing of log files





 Le mardi 29 mars 2005 à 17:52, Orton, Yves écrivait:
  
  I started working on a project like this but never got 
 around to finishing
  it. I called it Generic Record Processing System IE GRPS. 
 The point being
  that this isnt a facility related to parsing log files, its 
 a facility
  relating to processing any file of parsable records in a 
 mechanical way.
 
 Then what do you think of Record::Processor?


Great. Although you might want to take a little bit of time to think about how you would subdivide that space. For instance i could imagine:

 Record::Processor::Parser
 Record::Processor::Writer
 Record::Processor::Writer::XML
 Record::Processor::Writer::xSV
 Record::Processor::Writer::Packed
 Record::Processor::Reader::XML
 Record::Processor::Reader::xSV
 Record::Processor::Reader::Packed


... Etc...


If the framework makes sense it should be fairly easy to extend it for new data representations, output formats and the like. For instance maybe I have some kind of specially encoded records that need to be preprocessed before your framework can be executed then it should be fairly easy to add a new subclass and have it DWIM.

Also, when i say these classes what im thinking is that they encapsulate the knowledge about how to convert a rule specification into _source_code_ im not thinking that they should have methods that are executed inside of the parse loop. IMO there shouldnt be ANY subroutines inside of the parse loop. That way the resulting parser is lean and mean and fast. No method lookup BS or subroutine call stack overhead. 

Anyway, as i said i look forward to seeing your work. :-)
Yves





TRIEs in the core (was: Re: Module for simple processing of log files_

2005-03-30 Thread David Landgren
Orton, Yves wrote:
[...]
shameless plug
But David and the other Regexp authors need to update their code to take 
advantage of 5.9.2 and later innate TRIE optimisation. They still have 
room for optimising the patterns that they build but they will need to 
build fairly different looking patterns to really harness the TRIE regop.

/shameless plug
No, I've been following the threads on p5p. I've been looking hard at 
the stuff I do, and the patterns I generate come from little patterns 
that all tend to feature lots of metacharacters (otherwise I'd be doing 
hash lookups or index()), correct me if I'm wrong, such patterns don't 
benefit from your trie optimisations. E.g., what happens with

FROM MRS\. [A-Z]+ [A-Z]+
FROM MRS [A-Z]+ [A-Z]+
FROM MR [A-Z]+ [A-Z]+
FROM MR\. [A-Z]+ [A-Z]+
FROM: MRS\. [A-Z]+ [A-Z]+
FROM: MRS [A-Z]+ [A-Z]+
FROM: MR [A-Z]+ [A-Z]+
FROM: MR\. [A-Z]+ [A-Z]+
(actual patterns lifted from Nigerian spam). R::A produces
FROM:? MRS?\.? [A-Z]+ [A-Z]+
Instead of the whole mess or'ed together. I'm seriously lacking time to 
benchmark the differences.

David


Re: Should DSLIP codes be updated?

2005-03-30 Thread Ricardo SIGNES
* Robert Rothenberg [EMAIL PROTECTED] [2005-03-29T18:03:09]
 On 29/03/2005 22:14 Andy Lester wrote:
 
 Or thrown away entirely, along with the rest of the archaic idea of
 module registration.
 
 I'm sympathetic to the idea, but some of the information in DSLIP is 
 useful and shouldn't be thrown away (such as how supported, 
 alpha/beta/mature, and license). What isn't in META.yml should go there.

I assume you mean What isn't in META.yml should go in DSLIP.

Why not What isn't in META.yml should go in META.yml?

No reason every module that wants to provide this information can't.

-- 
rjbs


pgpsQatgjrGuz.pgp
Description: PGP signature


Re: Should DSLIP codes be updated?

2005-03-30 Thread Smylers
Ricardo SIGNES writes:

 I assume you mean What isn't in META.yml should go in DSLIP.
 
 Why not What isn't in META.yml should go in META.yml?

META.yml sounds much more sensible to me.  It wasn't around when DSLIP
was created, but it is now.

Of course, even if we change _where_ this metadata is stored, we still
have to address Robert's original points about the data itself.

Smylers