Hi,

I recently discovered an interesting bug, that occurs when you do a lot of parallel requests to Deltacloud API.

Let say you start Deltacloud API with the 'mock' driver as default driver. Then you do 3 parallel requests to retrieve RHEV-M images, realms and hardware_profiles. In that case I get this error:

<snip>
E, [2013-04-18T12:44:13.629135 #11892] ERROR -- 500: [LoadError] uninitialized constant Deltacloud::Drivers::Rhevm

/home/mfojtik/code/core/server/lib/deltacloud/helpers/driver_helper.rb:57:in `rescue in driver' /home/mfojtik/code/core/server/lib/deltacloud/helpers/driver_helper.rb:54:in `driver'

127.0.0.1 - - [18/Apr/2013 12:44:13] "GET /api HTTP/1.1" rhevm http://provider.url 500 127410 0.1480
</snip>

Note, that when I do the curl request manually with same params (provider, accept, I get 200 back).

I think this might be a threading issue, but I'm not sure how to fix it.
I started looking at 'driver' method and what we do here is that we require the 'driver' source file and then return the initialized driver.

We do this for every request (except we don't 'require' the driver source if the driver class exists in current namespace).

My impression is that the 'require' method is not thread-safe. It can be demonstrated on this code:

begin
  driver_class
rescue NameError => e
  require_relative(driver_source_name) ? retry :
     raise(LoadError.new(e.message))
end

In this case, we try to return the 'driver_class' and if the constant does not exists, we require the driver source file and call 'retry' that will then try to return it again. I think, if you have multiple parallel requests, the 'require_relative' (which is just alias for 'require') behave incorrectly, because multiple threads are requiring the same file in parallel.

My fix for this, that so far works for me is to change the line after 'rescue NameError => e' to:

Thread.exclusive { require_relative(driver_source_name) } ? retry : raise(LoadError.new(e.message))

With this I don't run into any problems with parallel requests (so far).
However, I'm not 'ruby threads' expert, so any advise from somebody more experienced is appreciated.

Also I think one task, we need to do in Deltacloud/CIMI in close future would be to identify spots in our code base that could be potentially not thread safe.

Having somebody to write some reasonable benchmarking tool (like 'ab' with different urls/drivers/etc) would help to identifying this spots.

  -- Michal

--

Michal Fojtik <mfoj...@redhat.com>
Deltacloud API, CloudForms

Reply via email to