The Derby one.
On Thu, Aug 14, 2014 at 7:06 PM, Chathuri Wimalasena <[email protected]> wrote: > Hi Sachith, > > Which DB you are using to do the profiling ? > > > On Wed, Aug 13, 2014 at 11:51 PM, Sachith Withana <[email protected]> > wrote: > >> Here's how I've written the script to do it. >> >> Experiments loaded: >> 10 users, 4 projects per each user, >> each user would have 1000 to 100,000 experiments (1000,10,000,100,000) >> containing experiments like echo, Amber >> >> Methods tested: >> >> getExperiment() >> searchExperimentByName >> searchExperimentByApplication >> searchExperimentByDescription >> >> WDYT? >> >> >> On Tue, Aug 12, 2014 at 6:58 PM, Marlon Pierce <[email protected]> wrote: >> >>> You can start with the API search functions that we have now: by name, >>> by application, by description. >>> >>> Marlon >>> >>> >>> On 8/12/14, 9:25 AM, Lahiru Gunathilake wrote: >>> >>>> On Tue, Aug 12, 2014 at 6:42 PM, Marlon Pierce <[email protected]> wrote: >>>> >>>> A single user may have O(100) to O(1000) experiments, so 10K is too >>>>> small >>>>> as an upper bound on the registry for many users. >>>>> >>>> +1 >>>> >>>> I agree with Marlon, we have the most basic search method, but the >>>> reality >>>> is we need search criteria like Marlon suggest, and I am sure content >>>> based >>>> search will be pretty slow with large number of experiments. So we have >>>> to >>>> use a search platform like Solr to improve the performance. >>>> >>>> I think first you can do the performance test without content based >>>> search >>>> then we can implement that feature, then do performance analysis, if its >>>> too bad(more likely) then we can integrate a search platform to improve >>>> the >>>> performance. >>>> >>>> Lahiru >>>> >>>> We should really test until things break. A plot implying infinite >>>>> scaling (by extrapolation) is not useful. A plot showing OK scaling >>>>> up to >>>>> a certain point before things decay is useful. >>>>> >>>>> I suggest you post more carefully a set of experiments, starting with >>>>> Lahiru's suggestion. How many users? How many experiments per user? >>>>> What >>>>> kind of searches? Probably the most common will be "get all my >>>>> experiments >>>>> that match this string", "get all experiments that have state FAILED", >>>>> and >>>>> "get all my experiments from the last 30 days". But the API may not >>>>> have >>>>> the latter two yet. >>>>> >>>>> So to start, you should specify a prototype user. For example, each >>>>> user >>>>> will have 1000 experiments: 100 AMBER jobs, 100 LAMMPS jobs, etc. Each >>>>> user >>>>> will have a unique but human readable name (user1, user2, ...). Each >>>>> experiment will have a unique human readable description (AMBER job 1 >>>>> for >>>>> user 1, Amber job 2 for user 1, ...), etc that is suitable for >>>>> searching. >>>>> >>>>> Post these details first, and then you can create via scripts >>>>> experiment >>>>> registries of any size. Each experiment is different but suitable for >>>>> pattern searching. >>>>> >>>>> This is 10 minutes worth of thought while waiting for my tea to brew, >>>>> so >>>>> hopefully this is the right start, but I encourage you to not take >>>>> this as >>>>> fixed instructions. >>>>> >>>>> Marlon >>>>> >>>>> >>>>> On 8/12/14, 8:54 AM, Lahiru Gunathilake wrote: >>>>> >>>>> Hi Sachith, >>>>>> >>>>>> How did you test this ? What database did you use ? >>>>>> >>>>>> I think 1000 experiments is a very low number. I think most important >>>>>> part >>>>>> is when there are large number of experiments, how expensive is the >>>>>> search >>>>>> and how expensive is a single experiment retrieval. >>>>>> >>>>>> If we support to get defined number of experiments in the API (I think >>>>>> this >>>>>> is the practical scenario, among 10k experiments get 100) we have to >>>>>> test >>>>>> the performance of that too. >>>>>> >>>>>> Regards >>>>>> Lahiru >>>>>> >>>>>> >>>>>> On Tue, Aug 12, 2014 at 4:59 PM, Sachith Withana <[email protected] >>>>>> > >>>>>> wrote: >>>>>> >>>>>> Hi all, >>>>>> >>>>>>> I'm testing the registry with 10,1000,10,000 Experiments and I've >>>>>>> tested >>>>>>> the database performance executing the getAllExperiments method. >>>>>>> I'll post the complete analysis. >>>>>>> >>>>>>> What are the other methods that I should test using? >>>>>>> >>>>>>> getExperiment(experiment_id) >>>>>>> searchExperiment >>>>>>> >>>>>>> Any pointers? >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Wed, Jul 23, 2014 at 6:07 PM, Marlon Pierce <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>> Thanks, Sachith. Did you look at scaling also? That is, will the >>>>>>> >>>>>>>> operations below still be the slowest if the DB is 10x, 100x, 1000x >>>>>>>> bigger? >>>>>>>> >>>>>>>> Marlon >>>>>>>> >>>>>>>> >>>>>>>> On 7/23/14, 8:22 AM, Sachith Withana wrote: >>>>>>>> >>>>>>>> Hi all, >>>>>>>> >>>>>>>>> I'm profiling the current registry in few different aspects. >>>>>>>>> >>>>>>>>> I looked into the database operations and I've listed the >>>>>>>>> operations >>>>>>>>> that >>>>>>>>> take the most amount of time. >>>>>>>>> >>>>>>>>> 1. Getting the Status of an Experiment (takes around 10% of the >>>>>>>>> overall >>>>>>>>> time spent) >>>>>>>>> Has to go through the hierarchy of the datamodel to get to >>>>>>>>> the >>>>>>>>> actual >>>>>>>>> experiment status ( node, tasks ...etc) >>>>>>>>> >>>>>>>>> 2. Dealing with the Application Inputs >>>>>>>>> Strangely it takes a long time for the queries regarding the >>>>>>>>> ApplicationInputs to complete. >>>>>>>>> This is a part of the new Application Catalog >>>>>>>>> >>>>>>>>> 3. Getting all the Experiments ( using the * wild card) >>>>>>>>> This takes the maximum amount of time when queried at >>>>>>>>> first. But >>>>>>>>> thanks >>>>>>>>> to the OpenJPA caching, it flattens out as we keep querying. >>>>>>>>> >>>>>>>>> To reduce the first issue, I would suggest to have a different >>>>>>>>> table >>>>>>>>> for >>>>>>>>> Experiment Summaries, >>>>>>>>> where the status ( both the state and the state update time) would >>>>>>>>> be >>>>>>>>> the >>>>>>>>> only varying entity, and use that to improve the query time for >>>>>>>>> Experiment >>>>>>>>> summaries. >>>>>>>>> >>>>>>>>> It would also help improve the performance for getting all the >>>>>>>>> Experiments >>>>>>>>> ( experiment summaries) >>>>>>>>> >>>>>>>>> WDYT? >>>>>>>>> >>>>>>>>> ToDos : Look into memory consumption ( in terms of memory leakage >>>>>>>>> ...etc) >>>>>>>>> >>>>>>>>> >>>>>>>>> Any more suggestions? >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>> Thanks, >>>>>>> Sachith Withana >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>> >>> >> >> >> -- >> Thanks, >> Sachith Withana >> >> > -- Thanks, Sachith Withana
