Emiliano Heyns wrote:
> 
> > If so, it is very good, I'd like
> > to add some LaTeX-like formating after 1.2.6 will be released to make life
> > easier for scientific applications of Midgard. Current implementation of parser
> > has some limitations in extension scheme (you'll need to rewrite parser at
> > least in two or three places in different packages), so modularizing it would
> > be great. Actually, I've already have LaTeX-like formatting done in PHP using
> > regular expressions and possibly support for PCRE in the text parser would be
> > useful. I mean that text formating modules could use generalized API for
> > accessing PCRE library like it is done with DB support.
> 
> Seems like you allready put some thought in this. I would welcome this
> concept,
> so if you have ideas on this API I'd gladly discuss the remifications.

Actually, I'm just starting to crystalize these ideas into somewhat
solid form. But from what I experienced before, with this API should be
possible at least:
- dynamically load/unload wanted parsers;
- provide access from one parser to functions exported by another one;
- operate with text content in terms of text objects (i.e. paragraph,
word, page...). Latter could be done with internal parser (equal in its
rights with other ones) that does partitition of text and returns each
part very similar how work fetches from DB;
- chaining parsers together, creating new parser as a queue of existed
parsers - very similar to effects in graphics editors;

In this situation, for example, support for PCRE's regular expressions
would be just another parser.

It seems that it should be done at low level of Midgard, in midgard-lib.
Then we will have open system which operates by parsers just like PHP
operates by external modules. The main difference will be in the
language where parser's functions is accessible - it will be C source of
parser. Thus, we has no headache with syntaxical and lexical analyzers.
Instead, at the high level (for example, at PHP script) we will operate
with queues of parsers but not with their functions. You may see parser
at this level like single function "black box" that accepts information
and transforms it into another format. Last parser in each queue thus
will be one that outputs Net-wide format (either HTML, XML, etc, or, for
example, PDF).

Of course, we'll have to go very long way to implement this idea, but...
:-) 
> 
> > Then creating parsers
> > would be more efficient (it could be relatively simple to create parser for
> > Word-like format, as those apps already done in Perl and C). Thus, we could
> > achieve the same feature set that proprietary systems (like NPS) sell for
> > thousands of dollars. Especially it would be great if those parsers could be
> > dynamically loaded (hence, optimisation for memory footprint will be very
> > effective). Also real document flow is impossible without those things.
> 
> Hmm, nice. But the current parser does all it's work in memory; I'm not
> sure
> I'd want to serve many Word documents like this concurrently. If we want
> to
> use this we may want to have a way to circumvent this.
I mean that parser will be extractable from the core of system. But once
it loaded at the start of Midgard, it works just like current parser
translating resources into some of the net-wide formats. I just want to
have possibility to select individually needed parsers when it'll
required by the task.

-- 
Sincerely yours, 
Alexander Bokovoy 
<!-- 2:450/144.58 --- bokovoyATminsk.lug.net --- FractalsAtTheEdge -->

--
This is The Midgard Project's mailing list. For more information,
please visit the project's web site at http://www.midgard-project.org

To unsubscribe the list, send an empty email message to address
[EMAIL PROTECTED]

Reply via email to