-----------------------------------------------------------
New Message on BDOTNET
-----------------------------------------------------------
From: bignbullish
Message 8 in Discussion
Hi,
I guess what i understand now is that u need to fine tune ur search. Ok, the 3 major
data collection structures we have in .NET are ArrayList, SortedList and HashTable.
Among these, HashTable is fastest when searched on its key attribute in most cases.
So, let's choose HashTable. Next, we need a list of words to search and a list of
keywords. Lets keep them in a hashtable. Since these are strings and we know that
string comparisons are costly, lets keep the hashcodes of these strings, basically
integers, as the key attribute while adding to their respective hashtables. Ok, we are
done with the data. Now the logic part. My approach to this would be ...
1> get the text of the MSWord document
2> find the words in them using Regular Expressions
3> match the words with keywords
This is by no means the fastest method. But, probably a better choice of the existing
options. Try the code attached below to see it for yourself.
- Raghu (bignbullish)
private void SearchDocFiles() {
//lets create a list to be searched
Hashtable ht = new Hashtable();
ht.Add("Jai".GetHashCode(),"Jai");
int count = 0;
Random r = new Random();
while ( count == 100 )
{ try { string temp = "Jai" + r.Next().ToString(); ht.Add(temp.GetHashCode(), temp); }
catch { continue; }
count++; }
OpenFileDialog openFileDialog1 = new OpenFileDialog();
//a RegEx to find out the words
Regex searchRegex = new Regex(@"\b\w+\b",RegexOptions.Compiled);
Word.ApplicationClass wordApp = new Word.ApplicationClass();
if (openFileDialog1.ShowDialog() == DialogResult.OK)
{
object fileName = openFileDialog1.FileName;
object readOnly = false;
object isVisible = false;
object saveChanges = false;
object missing = System.Reflection.Missing.Value;
wordApp.Visible = false;
Word.Document aDoc = wordApp.Documents.Open(ref fileName, ref missing,ref readOnly,
ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref
missing, ref missing, ref isVisible, ref missing, ref missing, ref missing);
aDoc.Activate();
wordApp.Selection.WholeStory();
//get the text of the doc file
string display = wordApp.Selection.Text;
wordApp.Quit(ref saveChanges, ref missing, ref missing);
//use the RegEx to find the words in the text
MatchCollection mc = searchRegex.Matches(display);
Hashtable sl = new Hashtable();
//log the time in ticks
long startTick = DateTime.Now.Ticks;
foreach( Match m in mc ) {
if ( m.Success ) {
//make a hashtable of all the words found by the RegEx ... add hashcode and string
try { sl.Add(m.Value.GetHashCode(), m.Value); }
catch { continue; }
}
}
display = "";
//now search for the hashcode of the keyword with the hashcode of the words found
foreach ( int hashCode in ht.Keys )
if ( sl.ContainsKey(hashCode) )
display += "Match found : " + hashCode.ToString() + Environment.NewLine;
display += "Total ticks : " + Convert.ToString(DateTime.Now.Ticks - startTick);
MessageBox.Show(display);
}
}
-----------------------------------------------------------
To stop getting this e-mail, or change how often it arrives, go to your E-mail
Settings.
http://groups.msn.com/bdotnet/_emailsettings.msnw
Need help? If you've forgotten your password, please go to Passport Member Services.
http://groups.msn.com/_passportredir.msnw?ppmprop=help
For other questions or feedback, go to our Contact Us page.
http://groups.msn.com/contact
If you do not want to receive future e-mail from this MSN group, or if you received
this message by mistake, please click the "Remove" link below. On the pre-addressed
e-mail message that opens, simply click "Send". Your e-mail address will be deleted
from this group's mailing list.
mailto:[EMAIL PROTECTED]