I will expose my situation in detail otherwise i think most people will not
understand what is happening.
I'm using remoting as a tool for communications in the distributed
simulation of a pollutant transport in a water body, like an estuary.
Parallelization is achieved by domain decomposition, assigning a peace of
the 2D computational domain to a machine. One machine, with one processor
can receive several partitions (small spatial regions) in order to mitigate
the possibility of load imbalances. Therefore, for each machine i have one
thread per partition, allowing only one to run. To do so i use a Mutex. To
minimize the communication cost i have to find which region inside the
partition is influencing the neighboring regions. After that i have to apply
the numerical method to them and asynchronously send pollutant mass to the
neighbors and at the same time apply the numerical method to those having no
influence on neighbors. Thereby, per machine i will have a main thread
responsible for the simulation and those as a consequence of the
asynchronous neighboring communications. The simulation thread runs with a
priority below normal.
If for example, i have a cluster with 2 machines and each machine with 9
partitions there is no problem. If i increase the number of partitions per
machine i will increase the number of possible neighbors and therefore the
number of secondary threads and the problem relies here. After some number
of secondary threads, the remoting mechanism will completely stop answering
to any new call. Everything will stay stopped and the CPUs will stay idle.
To test if it was a problem with synchronization of my responsibility i have
a button on my end user application doing some stupid call to one of the
server components that i'm sure it has no problems with synchronization. The
call will never arrive to the server. The client will stay blocked and my
breakpoint at the server side will not capture anything.
By all this, i think there is a problem with some remoting class with
synchronization. An idle system not answering to nothing it really looks
like a synchronization issue.
If anyone has some suggestion how to solve this i'd really like to know it.

Regards,

Manuel

You can read messages from the Advanced DOTNET archive, unsubscribe from Advanced 
DOTNET, or
subscribe to other DevelopMentor lists at http://discuss.develop.com.

Reply via email to