Hey,
        I sent a file called libhtdig-3.2.0.b4.tgz to Gilles and asked him
to stick it in the 'contrib' directory.

        It's part of a larger project to restructure the code for these
goals:
1.  All htdig code will be contained in the library
2.  This library will be able to index data & respond to querying APIs from
PHP or any other cgi-bin or program.


        Unzip it in your base htdig directory. It contains these files:
libhtdig/prepare.sh
libhtdig/Makefile
libhtdig/libhtdig_htdig.cc
libhtdig/libhtdig_htmerge.cc
libhtdig/libhtdig_api.h

after doing a configure & make on the latest snapshot, you can run
prepare.sh and make within the new 'libhtdig' directory.

The prepare.sh copies the htdig/htdig code files to this directory.
The libhtdig_htdig.cc is a callable replacement for the htdig executable.
The libhtdig_htmerge.cc is a callable replacement for the htmerge executable.

The makefile compiles these files using htdig's standard method, then
links these files and all files necessary for building the 6 htdig .so
files into ONE libhtdig.3.2.0.so.

Here's an example of its use:

    htdig_parameters_struct  htdig_params;

    htdig_params.debug = 1;
    htdig_params.initial = TRUE;
    htdig_params.create_text_database = FALSE;
    htdig_params.report_statistics = FALSE;
    htdig_params.alt_work_area = FALSE; 

    sprintf(htdig_params.configFile, "/etc/htdig/htdig.conf");
    strcpy(htdig_params.credentials,"");
    strcpy(htdig_params.max_hops, "");    //9 digit limit
    strcpy(htdig_params.minimalFile, "");
    strcpy(htdig_params.URL, "");   //stdin HTTP addrs

    htdig_index_open(&htdig_params);
    htdig_index_urls();
    htdig_index_close();


Here's the TODO:

1. Generalize the 'Retriever' class in htdig-exe
        It would be nicer to have a base class for the Retriever class,
the current class would be inherited from the new base class.  This would
enable developers to create their own retrievers and parsers and be able
to mix and match them.  Currently the parser classes receive a Retriever
object as a parameter and issue callback-style calls to the Retriever
object.

        You could derive new classes from the current Retriever object,
but you would carry around all kinds of junk that may be unneeded if you
are indexing documents from other sources.

see:
    htdig_index_open(&htdig_params);
    htdig_index_document(.....);
    htdig_index_close();
    htdig_merge(&htmerge_params);    //merges new index with existing index.


2. Include some code from htsearch & create PHP wrapper functions for the
searching code.
        The current htsearch-php3.0.1.1 module written Torsten Neuer calls
the htsearch cgi-bin and repackages the output.  PHP is a very flexible
web-language with strong string-manipulation ability (much like perl).  It
would be more elegant to have a set of PHP wrappers written in C that
provide an interface back and forth to the core searching code.

        While I'm not suggesting that this replace the htsearch cgi-bin,
alot of the query parsing code could be replaced with fewer lines of
PHP.  A developer could even write a different query language for the core
searching code (think 'SearchSQL' or something like it).  This will be
especially powerful once the 'indexable fields' feature is incorporated.

        At the end of this process htdig can be integrated with other
software in a variety of ways.. in some cases taking the place of a
SQL database for storing, querying & optionally displaying documents.

        We will hopefully be using this as a document archiving tool that
will eliminate lots of old/infrequently used documents from a SQL
database.

        I'm going to try to keep this project as a patch from the current
snapshot.  Unzip it, run a script file (copies files, diffs code), make
and you are done.

Feedback is welcome!

-- 
Neal Richter 
Knowledgebase Developer
RightNow Technologies, Inc.
Customer Service for Every Web Site



_______________________________________________
htdig-dev mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/htdig-dev

Reply via email to