-----------------------------------------------------------

New Message on BDOTNET

-----------------------------------------------------------
From: bignbullish
Message 8 in Discussion


Hi, 
I guess what i understand now is that u need to fine tune ur search. Ok, the 3 major 
data collection structures we have in .NET are ArrayList, SortedList and HashTable. 
Among these, HashTable is fastest when searched on its key attribute in most cases. 
So, let's choose HashTable. Next, we need a list of words to search and a list of 
keywords. Lets keep them in a hashtable. Since these are strings and we know that 
string comparisons are costly, lets keep the hashcodes of these strings, basically 
integers, as the key attribute while adding to their respective hashtables. Ok, we are 
done with the data. Now the logic part. My approach to this would be ...  
1> get the text of the MSWord document   
2> find the words in them using Regular Expressions 
3> match the words with keywords 
This is by no means the fastest method. But, probably a better choice of the existing 
options. Try the code attached below to see it for yourself. 
- Raghu (bignbullish) 
  
private void SearchDocFiles() { 
//lets create a list to be searched 
Hashtable ht = new Hashtable(); 
ht.Add("Jai".GetHashCode(),"Jai"); 
int count = 0; 
Random r = new Random(); 
while ( count == 100 ) 
{ try { string temp = "Jai" + r.Next().ToString(); ht.Add(temp.GetHashCode(), temp); } 
catch { continue; } 
count++; } 
OpenFileDialog openFileDialog1 = new OpenFileDialog(); 
//a RegEx to find out the words  
Regex searchRegex = new Regex(@"\b\w+\b",RegexOptions.Compiled); 
Word.ApplicationClass wordApp = new Word.ApplicationClass(); 
if (openFileDialog1.ShowDialog() == DialogResult.OK) 
{ 
object fileName = openFileDialog1.FileName; 
object readOnly = false; 
object isVisible = false; 
object saveChanges = false; 
object missing = System.Reflection.Missing.Value; 
wordApp.Visible = false; 
Word.Document aDoc = wordApp.Documents.Open(ref fileName, ref missing,ref readOnly, 
ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref 
missing, ref missing, ref isVisible, ref missing, ref missing, ref missing); 
aDoc.Activate(); 
wordApp.Selection.WholeStory(); 
//get the text of the doc file  
string display = wordApp.Selection.Text; 
wordApp.Quit(ref saveChanges, ref missing, ref missing); 
//use the RegEx to find the words in the text 
 
MatchCollection mc = searchRegex.Matches(display); 
Hashtable sl = new Hashtable(); 
//log the time in ticks 
long startTick = DateTime.Now.Ticks; 
 
foreach( Match m in mc ) { 
if ( m.Success ) { 
//make a hashtable of all the words found by the RegEx ... add hashcode and string 
try { sl.Add(m.Value.GetHashCode(), m.Value); } 
catch { continue; } 
} 
} 
display = ""; 
//now search for the hashcode of the keyword with the hashcode of the words found 
foreach ( int hashCode in ht.Keys ) 
if ( sl.ContainsKey(hashCode) ) 
display += "Match found : " + hashCode.ToString() + Environment.NewLine; 
display += "Total ticks : " + Convert.ToString(DateTime.Now.Ticks - startTick); 
MessageBox.Show(display); 
} 
}

-----------------------------------------------------------

To stop getting this e-mail, or change how often it arrives, go to your E-mail 
Settings.
http://groups.msn.com/bdotnet/_emailsettings.msnw

Need help? If you've forgotten your password, please go to Passport Member Services.
http://groups.msn.com/_passportredir.msnw?ppmprop=help

For other questions or feedback, go to our Contact Us page.
http://groups.msn.com/contact

If you do not want to receive future e-mail from this MSN group, or if you received 
this message by mistake, please click the "Remove" link below. On the pre-addressed 
e-mail message that opens, simply click "Send". Your e-mail address will be deleted 
from this group's mailing list.
mailto:[EMAIL PROTECTED]

Reply via email to