Hi Donna - See previous post below that may help. Tom
////////////////////////////////////////////////////////
Hi,
In case this is of help to others:
Crux of problem:
I wanted numbers and characters such as # and + to be considered.
Solution:
implement a LowercaseWhitespaceAnalyzer and a
LowercaseWhitespaceTokenizer.
i.e.
IndexWriter writer = new IndexWriter(INDEX_DIR, new
LowercaseWhitespaceAnalyzer(), true);
Tom
=======================================================================
Diagnostics:
StandardAnalyzer
----------------
Enter Querystring: (C++ AND C#) Searching for: +c +c
Enter Querystring: (C\+\+ AND C\#) Searching for: +c +c
Enter Querystring: ("moss 2007" or "sharepoint 2007") and "asp.net"
Searching for: ("moss 2007" "sharepoint 2007") asp.net
SimpleAnalyser
--------------
Enter Querystring: C++ Searching for: c
Enter Querystring: C# Searching for: c
Enter Querystring: ("moss 2007" or "sharepoint 2007") and "asp.net"
Searching for: (moss or sharepoint) and "asp net"
WhitespaceAnalyzer
------------------
Enter Querystring: (C++ AND C#) Searching for: +C++ +C# Enter
Querystring: ("moss 2007" or "sharepoint 2007") and "asp.net"
Searching for: ("moss 2007" or "sharepoint 2007") and asp.net
KeywordAnalyzer
---------------
Enter Querystring: (C++ AND C#) Searching for: +C++ +C# Enter
Querystring: ("moss 2007" or "sharepoint 2007") and "asp.net"
Searching for: (moss 2007 or sharepoint 2007) and asp.net
StopAnalyzer
------------
Enter Querystring: (C\++ AND C\#) Searching for: +c +c Enter
Querystring: ("MOSS 2007" or "SHAREPOINT 2007") and "ASP.NET"
Searching for: (moss sharepoint) "asp net"
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
-----Original Message-----
From: Donna L Gresh [mailto:[EMAIL PROTECTED]
Sent: 04 March 2008 19:22
To: [email protected]
Subject: C++ as token in StandardAnalyzer?
I saw some discussion in the archives some time ago about the fact that
C++ is tokenized as C in the StandardAnalyzer; this seems to still be
C++ the
case; I was wondering if there is a simple way for me to get the
behavior I want for C++ (that it is tokenized as C++) in particular, and
perhaps for other more ideosyncratic terms I may have in my own
application-- Thanks Donna
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]