Re: Profiling the current Airavata registry

Marlon Pierce Tue, 12 Aug 2014 06:13:31 -0700

A single user may have O(100) to O(1000) experiments, so 10K is toosmall as an upper bound on the registry for many users. We should reallytest until things break. A plot implying infinite scaling (byextrapolation) is not useful. A plot showing OK scaling up to a certainpoint before things decay is useful.

I suggest you post more carefully a set of experiments, starting withLahiru's suggestion. How many users? How many experiments per user?What kind of searches? Probably the most common will be "get all myexperiments that match this string", "get all experiments that havestate FAILED", and "get all my experiments from the last 30 days". Butthe API may not have the latter two yet.

So to start, you should specify a prototype user. For example, eachuser will have 1000 experiments: 100 AMBER jobs, 100 LAMMPS jobs, etc.Each user will have a unique but human readable name (user1, user2,...). Each experiment will have a unique human readable description(AMBER job 1 for user 1, Amber job 2 for user 1, ...), etc that issuitable for searching.

Post these details first, and then you can create via scripts experimentregistries of any size. Each experiment is different but suitable forpattern searching.

This is 10 minutes worth of thought while waiting for my tea to brew, sohopefully this is the right start, but I encourage you to not take thisas fixed instructions.


Marlon

On 8/12/14, 8:54 AM, Lahiru Gunathilake wrote:

Hi Sachith,

How did you test this ? What database did you use ?

I think 1000 experiments is a very low number. I think most important part
is when there are large number of experiments, how expensive is the search
and how expensive is a single experiment retrieval.

If we support to get defined number of experiments in the API (I think this
is the practical scenario, among 10k experiments get 100) we have to test
the performance of that too.

Regards
Lahiru


On Tue, Aug 12, 2014 at 4:59 PM, Sachith Withana <[email protected]>
wrote:

Hi all,

I'm testing the registry with 10,1000,10,000 Experiments and I've tested
the database performance executing the getAllExperiments method.
I'll post the complete analysis.

What are the other methods that I should test using?

getExperiment(experiment_id)
searchExperiment

Any pointers?



On Wed, Jul 23, 2014 at 6:07 PM, Marlon Pierce <[email protected]> wrote:

Thanks, Sachith. Did you look at scaling also?  That is, will the
operations below still be the slowest if the DB is 10x, 100x, 1000x bigger?

Marlon


On 7/23/14, 8:22 AM, Sachith Withana wrote:

Hi all,

I'm profiling the current registry in few different aspects.

I looked into the database operations and I've listed the operations that
take the most amount of time.

1. Getting the Status of an Experiment (takes around 10% of the overall
time spent)
      Has to go through the hierarchy of the datamodel to get to the
actual
experiment status ( node,     tasks ...etc)

2. Dealing with the Application Inputs
      Strangely it takes a long time for the queries regarding the
ApplicationInputs to complete.
      This is a part of the new Application Catalog

3. Getting all the Experiments ( using the * wild card)
      This takes the maximum amount of time when queried at first. But
thanks
to the OpenJPA        caching, it flattens out as we keep querying.

To reduce the first issue, I would suggest to have a different table for
Experiment Summaries,
where the status ( both the state and the state update time) would be the
only varying entity, and use that to improve the query time for
Experiment
summaries.

It would also help improve the performance for getting all the
Experiments
( experiment summaries)

WDYT?

ToDos :  Look into memory consumption ( in terms of memory leakage
...etc)


Any more suggestions?


--
Thanks,
Sachith Withana

Re: Profiling the current Airavata registry

Reply via email to