Or could be handled by the Ant index task I submitted:
<index index="${build.dir}/index">
<fileset dir="docs">
<exclude name="**/header.html"/>
</fileset>
</index>
----- Original Message -----
From: "Steven J. Owens" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Thursday, February 14, 2002 8:41 PM
Subject: Re: excluding files / refining search
> Brian Rook <[EMAIL PROTECTED]> writes:
> > The site I'm working on has a lot of small html files that are used for
page
> > construction (nav bars, footers, etc) and they're being returned high in
the
> > results because they contain the search term(s) I'm looking for and are
> > small so they rank higher than larger documents.
> >
> > I want to exclude them from the index and I've come up with two ideas:
> >
> > 1) move them to a directory, which I will exclude from the index, but
I'll
> > have to change a bunch of links
> >
> > 2) detect them with some sort of flag and exclude them from the index.
We
> > were thinking that we could have a fake tag that lucene would detect and
not
> > index those pages.
>
> Why not just have an exclude list of some sort? In the code you
> wrote to select files for indexing, just have it check against a list
> of files you want to exclude. In the demo application, you would edit
>
> jakarta-lucene/src/demo/org/apache/lucene/demo/IndexFiles.java
>
>
> The quick and dirty method would be to edit this section of code:
>
> public static void indexDocs(IndexWriter writer, File file)
> throws Exception {
> if (file.isDirectory()) {
> String[] files = file.list();
> for (int i = 0; i < files.length; i++)
> indexDocs(writer, new File(file, files[i]));
> } else {
> System.out.println("adding " + file);
> writer.addDocument(FileDocument.Document(file));
> }
> }
>
>
> To something like this:
>
>
> public static void indexDocs(IndexWriter writer, File file)
> throws Exception {
> if (file.isDirectory()) {
> String[] files = file.list();
> for (int i = 0; i < files.length; i++)
> indexDocs(writer, new File(file, files[i]));
> } else {
> if (checkFileName(file)) {
> System.out.println("skipping " + file) ;
> } else {
> System.out.println("adding " + file);
> writer.addDocument(FileDocument.Document(file));
> }
> }
> }
>
> public static boolean checkFileName(File file) {
> String name = file.getName() ;
> if (name == "footer.html" ||
> name == "header.html" ||
> name == "menu.html" ||
> name == "navbar.html") {
> return false ;
> }
> return true ;
> }
>
>
> A more realistic implementation would use an "exclude file" of
> filenames to ignore, load them into a collection (probably a HashSet)
> and keep that collection around as an instance variable. Then
> checkFileName() just returns !excludedSet.contains(name).
>
> Steven J. Owens
> [EMAIL PROTECTED]
>
> --
> To unsubscribe, e-mail:
<mailto:[EMAIL PROTECTED]>
> For additional commands, e-mail:
<mailto:[EMAIL PROTECTED]>
>
>
--
To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>