Michael Wechner <[EMAIL PROTECTED]> wrote:
> Andreas Hartmann wrote:
> > Michael Wechner wrote:
> >> Andreas Hartmann wrote:
> >>> the paths in crawler.xconf and lucene.xconf are resolved
> >>> from different locations (execution directory vs. config file
> >>> directory).
> >> are you sure? I think both configuration files are based from config
> >> file directory resp. where crawler.xconf and lucene.xconf are
> >> located.
> > Strange, that didn't work for me. I started using the config
> > from the website:
> you're right. I guess nobody really realized it yet, because people might
> have specified absolute paths within the config. Actually the path resolving
> was implemented within the Configuration class, but the crawler didn't
> use it.

The ConfigurableIndexer is fantastic.  The main purpose is to add
fields to be searched or displayed.  That is unnecessary (and maybe
useless) for crawling or indexing HTML.

> I have fixed it and it should work now.
> The crawler file might still need to be specified as absolute path:
> ant -f crawl_and_index.xml crawl -Dcrawler.xconf=/foo/bar/crawler.xconf
> but I am actually not sure about it, but it won't hurt ;-)

Relative paths work, but the example instructions may not match the
current directory structure:
ant -f ../../build/lenya/webapp/lenya/bin/crawl_and_index.xml
-Dlucene.xconf=../../build/lenya/webapp/lenya/pubs/default/config/search/lucene-live.xconf
index

> >> I don't think we should remove the configuration file, because the
> >> dump and the
> >> index can become huge (depending on what you are indexing) and
> >> shoudln't reside
> >> within the application directory.
> > This makes sense. But then I'd rather have a single configuration point
> > of the index location for the complete Lenya webapp, which could default
> > to the container's work directory.

crawler.xconf does not care about the publication.
lucine-live.xconf needs to set the dump and index directories.

Whether the index is global for all publications is a design decision.
 It seemed to me that one publication = one website.  If so, searching
across publications is not good: one huge index filtered by
publication at runtime.  If the index is publication specific, each
index is much smaller, but multiple-publication searches are not
possible.

I believe Lenya should default to indexing each publication
separately.  Security is based on the publication; it really
complicates the code to check whether this visitor is allowed to
access each content within multiple publications.  (Is it possible to
log in to multiple publications?  How would the search results decide
which Identity to use?

Using one index for 2+ publications could be handled by setting the
publications to use the same index directory with incremental updates.

** The "one publication = one website" falls apart if multiple
publications are used for different applications: one website that
includes a "default" website application and a "blog" application. 
Security and most configuration is publication-specific, so using
multiple publications requires much customization to move the
{pub}/config directory to global.  (Does the new "plug-in"
architecture make it easier to have different functionality within one
publication?)

> well, one could merge the two files and use a "fallback" ...
> Michi

It is more important to develop calling the indexer from XSP than to
change the current configuration structure.  (I was tempted to move
all the publication-specific configuration to a single directory, but
kept the current structure to reduce changes.)

solprovider

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to