Hi Sachith, Which DB you are using to do the profiling ?
On Wed, Aug 13, 2014 at 11:51 PM, Sachith Withana <[email protected]> wrote: > Here's how I've written the script to do it. > > Experiments loaded: > 10 users, 4 projects per each user, > each user would have 1000 to 100,000 experiments (1000,10,000,100,000) > containing experiments like echo, Amber > > Methods tested: > > getExperiment() > searchExperimentByName > searchExperimentByApplication > searchExperimentByDescription > > WDYT? > > > On Tue, Aug 12, 2014 at 6:58 PM, Marlon Pierce <[email protected]> wrote: > >> You can start with the API search functions that we have now: by name, by >> application, by description. >> >> Marlon >> >> >> On 8/12/14, 9:25 AM, Lahiru Gunathilake wrote: >> >>> On Tue, Aug 12, 2014 at 6:42 PM, Marlon Pierce <[email protected]> wrote: >>> >>> A single user may have O(100) to O(1000) experiments, so 10K is too >>>> small >>>> as an upper bound on the registry for many users. >>>> >>> +1 >>> >>> I agree with Marlon, we have the most basic search method, but the >>> reality >>> is we need search criteria like Marlon suggest, and I am sure content >>> based >>> search will be pretty slow with large number of experiments. So we have >>> to >>> use a search platform like Solr to improve the performance. >>> >>> I think first you can do the performance test without content based >>> search >>> then we can implement that feature, then do performance analysis, if its >>> too bad(more likely) then we can integrate a search platform to improve >>> the >>> performance. >>> >>> Lahiru >>> >>> We should really test until things break. A plot implying infinite >>>> scaling (by extrapolation) is not useful. A plot showing OK scaling up >>>> to >>>> a certain point before things decay is useful. >>>> >>>> I suggest you post more carefully a set of experiments, starting with >>>> Lahiru's suggestion. How many users? How many experiments per user? >>>> What >>>> kind of searches? Probably the most common will be "get all my >>>> experiments >>>> that match this string", "get all experiments that have state FAILED", >>>> and >>>> "get all my experiments from the last 30 days". But the API may not >>>> have >>>> the latter two yet. >>>> >>>> So to start, you should specify a prototype user. For example, each >>>> user >>>> will have 1000 experiments: 100 AMBER jobs, 100 LAMMPS jobs, etc. Each >>>> user >>>> will have a unique but human readable name (user1, user2, ...). Each >>>> experiment will have a unique human readable description (AMBER job 1 >>>> for >>>> user 1, Amber job 2 for user 1, ...), etc that is suitable for >>>> searching. >>>> >>>> Post these details first, and then you can create via scripts experiment >>>> registries of any size. Each experiment is different but suitable for >>>> pattern searching. >>>> >>>> This is 10 minutes worth of thought while waiting for my tea to brew, so >>>> hopefully this is the right start, but I encourage you to not take this >>>> as >>>> fixed instructions. >>>> >>>> Marlon >>>> >>>> >>>> On 8/12/14, 8:54 AM, Lahiru Gunathilake wrote: >>>> >>>> Hi Sachith, >>>>> >>>>> How did you test this ? What database did you use ? >>>>> >>>>> I think 1000 experiments is a very low number. I think most important >>>>> part >>>>> is when there are large number of experiments, how expensive is the >>>>> search >>>>> and how expensive is a single experiment retrieval. >>>>> >>>>> If we support to get defined number of experiments in the API (I think >>>>> this >>>>> is the practical scenario, among 10k experiments get 100) we have to >>>>> test >>>>> the performance of that too. >>>>> >>>>> Regards >>>>> Lahiru >>>>> >>>>> >>>>> On Tue, Aug 12, 2014 at 4:59 PM, Sachith Withana <[email protected]> >>>>> wrote: >>>>> >>>>> Hi all, >>>>> >>>>>> I'm testing the registry with 10,1000,10,000 Experiments and I've >>>>>> tested >>>>>> the database performance executing the getAllExperiments method. >>>>>> I'll post the complete analysis. >>>>>> >>>>>> What are the other methods that I should test using? >>>>>> >>>>>> getExperiment(experiment_id) >>>>>> searchExperiment >>>>>> >>>>>> Any pointers? >>>>>> >>>>>> >>>>>> >>>>>> On Wed, Jul 23, 2014 at 6:07 PM, Marlon Pierce <[email protected]> >>>>>> wrote: >>>>>> >>>>>> Thanks, Sachith. Did you look at scaling also? That is, will the >>>>>> >>>>>>> operations below still be the slowest if the DB is 10x, 100x, 1000x >>>>>>> bigger? >>>>>>> >>>>>>> Marlon >>>>>>> >>>>>>> >>>>>>> On 7/23/14, 8:22 AM, Sachith Withana wrote: >>>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>>> I'm profiling the current registry in few different aspects. >>>>>>>> >>>>>>>> I looked into the database operations and I've listed the operations >>>>>>>> that >>>>>>>> take the most amount of time. >>>>>>>> >>>>>>>> 1. Getting the Status of an Experiment (takes around 10% of the >>>>>>>> overall >>>>>>>> time spent) >>>>>>>> Has to go through the hierarchy of the datamodel to get to >>>>>>>> the >>>>>>>> actual >>>>>>>> experiment status ( node, tasks ...etc) >>>>>>>> >>>>>>>> 2. Dealing with the Application Inputs >>>>>>>> Strangely it takes a long time for the queries regarding the >>>>>>>> ApplicationInputs to complete. >>>>>>>> This is a part of the new Application Catalog >>>>>>>> >>>>>>>> 3. Getting all the Experiments ( using the * wild card) >>>>>>>> This takes the maximum amount of time when queried at first. >>>>>>>> But >>>>>>>> thanks >>>>>>>> to the OpenJPA caching, it flattens out as we keep querying. >>>>>>>> >>>>>>>> To reduce the first issue, I would suggest to have a different table >>>>>>>> for >>>>>>>> Experiment Summaries, >>>>>>>> where the status ( both the state and the state update time) would >>>>>>>> be >>>>>>>> the >>>>>>>> only varying entity, and use that to improve the query time for >>>>>>>> Experiment >>>>>>>> summaries. >>>>>>>> >>>>>>>> It would also help improve the performance for getting all the >>>>>>>> Experiments >>>>>>>> ( experiment summaries) >>>>>>>> >>>>>>>> WDYT? >>>>>>>> >>>>>>>> ToDos : Look into memory consumption ( in terms of memory leakage >>>>>>>> ...etc) >>>>>>>> >>>>>>>> >>>>>>>> Any more suggestions? >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>> Thanks, >>>>>> Sachith Withana >>>>>> >>>>>> >>>>>> >>>>>> >>> >> > > > -- > Thanks, > Sachith Withana > >
