At 11:34 AM 6/2/04 -0700, Bill Moseley wrote:
On Wed, Jun 02, 2004 at 12:43:00PM -0500, Timm Murray wrote:
> At 09:45 AM 6/2/04 -0700, Bill Moseley wrote:
> <>
> >> I don't think either solution is particularly difficult to implement,
> >> but scanning the content files directly also lets us have an easier
> >> time analyzing the structure of the document.
> >
> >All the server does is supply the content.  Analyzing the content
> >happens after that, regardless of using the server or the file system.
> >Spidering lets you index the content as people see it on their browser.
>
> Take a look at the system being
> used:  http://www.perlmonks.org/index.pl?node_id=357248, particularly the
> 'Documents' subsection.

Seems like you could out grow that one.  Also seems like something
that's been done already in many forms.  You request the .inc file and
it gets transformed by the .tmpl file.  XML + XSLT?  SSI? I think you can do
better.

XML/XSLT is a mess, as is SSI. In fact, replacing SSI was exactly the goal with this system.


I implemented a small site with SSI and then did the same site with Apache::QuickCMS. The result was a sharp reduction in space. SSI site was 112k and 409 lines, while the Apache::QuickCMS version was 32k and 190 lines. (I can provide the tarball of the sites as implemented off-list if anyone wants to take a look).

I actually started with the content files being in an XML format. However, I ran into problems in coercing the XML parser that the embedded HTML should be treated as a string (so it can be put into the template parameter), not as more XML to be processed. While looking at how to solve this, I thought up the POD-like solution. I recoded it without any of the problems XML gave me, and it's probably faster, too.

I wasn't happy with having to allow <TMPL_INCLUDE> tags inside the content, as I fear it could be easily abused in naive ways. It also slows it down quite a bit, since it requires two passes through HTML::Template (at least, that's how it's currently implemented). However, for some of our data, I found we simply didn't have another choice.


You probably already did this, but you might want to review other CMS if
redesigning the site from scratch.  Here's a few lists:

  http://www.cmsmatrix.org/
  http://www.oscom.org/matrix/

I've looked at a lot of CMS systems--it's one of those massively over-implemented genres, much like templating systems :) I suspect the reason is that people look at other CMSes, decide they do almost but not quite exactly what they want, and end up implementing their own. So I decided I would add to the mix :) Yes, it's simple; that's intentional. I hope (possibly in vain) that I can avoid the feeping creaturism that tends to plague other CMSes.



There's also PHP.

PHP has other problems.


> Now, the system allows TMPL_INCLUDE tags in the content files (actually,
> it's implemented by passing it through HTML::Template a second time, so any
> TMPL_* tag will be processed, but this might change). Included files
> occasionally need to be part of the search, but most likely won't. But I
> don't feel I can make that assumption in all cases. So I need some way of
> saying which ones should be searched on if we should ever need that
> functionality (but default to not searching).


And you also need a system to process your template files like
Apache::QuickCMS does so you can index.  Give spidering a try, you may
find it's not as inefficient as you think.  libxml2 is damn fast.

The processing stage isn't hard (remember, simplicity is a goal of Apache::QuickCMS), and in any case, I think I can modify Apache::QuickCMS quite easily so that the processing stage could be directly used by another program. So I wouldn't need to write a seperate processor to run before the indexer.


I'm not really concerned with spidering being inefficient. Worse case I can imagine is that I set it to run before I leave one day and it's done when I come in the next morning. It just seems to me that it's a more clumsy solution to this problem.

(In fact, I do have a spider which runs through our entire site and jots down what pages link to what other pages. It takes 5-10 minutes to run. The resulting report is dumped in YAML format to be processed by other programs to generate various reports, such as what pages link to documents that don't exist. Which saved us a lot of time, because the boss wanted our current site mapped by hand before we got to the redesign. Now we have a report which is useful for stuff beyond redesigns, the least of which is the printed version of the report (1500 pages, double-sided), which is handy for my boss to walk into meetings to justify why we need a redesign :)



--
Bill Moseley
[EMAIL PROTECTED]



------------------------------------------------------- This SF.Net email is sponsored by the new InstallShield X.
From Windows to Linux, servers to mobile, InstallShield X is the one
installation-authoring solution that does it all. Learn more and
evaluate today! http://www.installshield.com/Dev2Dev/0504
_______________________________________________
Html-template-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/html-template-users

Reply via email to