I have tried to implement a multithreaded crawler and I am having
quite a few issues with it. As a newbie multithreading seems to be
quite harder than I thought it would be.
Firstly this is how I initiate the threads:
[code/]
startwatch.Start()
'myQ.Enqueue(thread)
For Each link In completeList
Try
Dim thread = New Thread(AddressOf processUrl)
numThread = numThread + 1
If numThread < 10 Then
ThreadList.Add(thread)
End If
thread.Start(link)
Catch ex As Exception
MessageBox.Show(ex.Message.ToString, "Error
Message", MessageBoxButtons.OK)
End Try
Next
For Each thread In ThreadList
thread.Join()
Next
startwatch.Stop()
elapsedTime = startwatch.ElapsedMilliseconds
[/code]
the idea is to read a list of urls and then have a few threads go and
fetch the pages. Once i get the pages I will parse the html to extract
more urls and then write these to the database. My problem is where I
try to go and parser the pages and write to the database. I use
synclock but still getting errors where it cannot access either the
file or database. At one point the program just crashes. Here is a
peek at the calling methods;
[code/]
If Not String.IsNullOrEmpty(html) Then
'get all links first
SyncLock html
links = parser.GetLinks(fromUrl, html)
End SyncLock
For Each link As String In links
...
...
...
Links_DBObj.insert_feedurls_link(link, feedlink, execError,
connObj_Generic, commObj_Generic)
[/code]
Does anyone have any suggestions? Others have mentioned using
synchronous queues etc but not too sure how to do that. What would be
the most effecient way to implement this? Have threads just fetch the
urls and the individually parse them or can I have the threads do that
too?