Some months ago I created an index from the reuters collection. I converted
the SGML files to XML using a tool that I've found somewhere on the net
(just google for it), then I parsed the files to create the index, using a
standard DOM parser. If you have problems parsing the SGML files I think you
should consider converting the files to XML. Otherwise post a sketch of your
indexing code to get some help.

Lorenzo

On 4/21/06, Malcolm Clark <[EMAIL PROTECTED]> wrote:
>
> Hi all,
> I didn't know whether to add this to the thread asking about TREC indexing
> or start a new one.
> Anyway, has anyone attempted to index/search the Reuters collection which
> consists of SGML?
> Mine seems to run through the process okay but alas I'm left with nothing
> in the index when I check with Luke or my own Search Engine.
> Anyone got any hints (apart from don't do it)?
> cheers,
> MC
>

Reply via email to