Re: Benchmarking result

John Gemignani Mon, 13 Sep 2021 17:41:31 -0700

Good explanation Josh.

I will add a two comments about the results -


   1. The changes made to the code for this specific improvement were known
   to be small. However, the changes will still decrease the execution time
   overall due to less code needing to be executed.
   2. The VLE code included some of these changes when it was pushed to the
   repository. So, this may have adversely affected the results making it
   appear even less of a change.

It was always known that these would be minor improvements but,
improvements none the less.

I believe that the best use case for these benchmarks is to allow us to see
when new code may positively or negatively impact our code base. That was
one of the reasons for building and using the benchmarks here - to show
there weren't any unforeseen negative impacts to performance and possibly a
small gain. The other use case being for statistics later on.

Going forward, I think we need to have these expanded to test other areas
of the code and more thoroughly. This will allow us to monitor the changes
to performance due to code additions.

Thoughts?

John

On Mon, Sep 13, 2021 at 11:21 AM Josh Innis <josh.in...@bitnine.net> wrote:

> Runtime performance of any query in SQL is dictated by the execution tree
> structure. AGE translates cypher commands into SQL as much as possible.
> Some exceptions are: variable length edges, the merge clause, and the
> CREATE/REMOVE/SET/DELETE clauses. (Sidenote: we do try to use Postgres
> functionality in the exceptions, however by definition, they must deviate
> from functionality allowed in the SQL standard in a notable way). The patch
> that we are benchmarking here does not affect the execution tree. Instead
> what it does is affect the time it takes to create the vertex agtype after
> Postgres has scanned the table the vertex is stored in and given the rest
> of the execution tree a tuple.
>
> A vertex is a map with very specific keys (id, label, and properties).
> This patch optimizes constructing that data type, because Agtype maps are
> ordered. The current version of agtype_build_vertex is anticipating to
> receive it's keys in an unordered way, which is unnecessary. This patch
> will bypass that check.
>
> There is no way to make this patch's performance statistically
> significant. We have reduced the execution tree to almost as little as
> possible. There is a way to simplify it more, which I have added below.
> However, it is not necessary. What we are seeing is exactly what I
> personally expected: a small but not significant improvement. We can
> continue to work on this, but we are improving our performance benchmarking
> tools at this point.
>
> SQL:
>
> SET search_path TO ag_catalog;
> LOAD 'age';
> SELECT create_graph('test_graph');
>
> SELECT *
> FROM f_test( 1000000, $q$
>     SELECT * from ag_catalog.cypher('test_graph', $$
>         CREATE (a:Person {id: 129, name: 'Brian', title: 'Researcher',
> city: 'Munich'})
>         RETURN a
>     $$) as (a agtype)
> $q$);
>
> EXPLAIN ANALYZE SELECT _agtype_build_vertex(P.id, 'Person'::cstring,
> P.properties) from test_graph."Person" as P;
>
> I think this isolates the test to exactly what is needed and nothing more.
> Given the other performance benchmarks, I anticipate about a 2% average
> improvement to runtime.
>
> There are some changes that Viet needs to make to the patch in the Slack
> channel (https://app.slack.com/client/TAET0CD3L/C0102328XEJ). I will add
> instructions there shortly and a tutorial on the 'amend' option for the
> commit command in git. If anyone reading this does not have access to the
> Slack channel, create an email in the dev mailing list and we will include
> you.
>
> Note: Also f_test needs to get renamed if we wanna add it to the
> repository in any way. Given the functionality, a verbose name would be:
> run_dynamic_sql_n_times
>
> We create value for our clients by connecting the world's data.
> Josh Innis Software Engineer
> Core R&D Team
> *P*: (831)278-0327
> 3945 Freedom Circle #260, Santa Clara, CA 95054
> <https://www.google.com/maps/place/3945+Freedom+Cir+%23260,+Santa+Clara,+CA+95054/>
> bitnine.net
>
>
> On Mon, Sep 13, 2021 at 7:34 AM Joe Fagan <joe.fa...@bitnine.net> wrote:
>
>> [image: image.png]
>>
>> Viet,
>>
>> For the results of last week's test unfortunately the optimised code is
>> not statistically significantly different from the non-optimised.
>>
>> When you send the new test results I can run it again (should take only a
>> few minutes now I have it set up) and re-run the test. I can also explain
>> the results if you're interested.
>>
>> Briefly the test is asking whether the before and after numbers could
>> have come from the same code (ie no changes). The results are saying that
>> yes, they could. Statistically the difference in results could (very)
>> easily have come from the same code. There is no significant difference. In
>> other words, you could have used only the before code, ran the test twice,
>> and the difference in results could easily be greater than the difference
>> you have observed in your test.
>>
>> Thanks
>> Joe
>>
>> On Fri, 10 Sept 2021 at 19:57, Joe Fagan <joe.fa...@bitnine.net> wrote:
>>
>>> Great Josh.
>>> I learnt loads reading your script
>>>
>>> On Fri, 10 Sep 2021 at 18:14, Josh Innis <josh.in...@bitnine.net> wrote:
>>>
>>>> Good Job Viet. I made some modifications to the SQL function that you
>>>> made. This may be a way to analyze how long it takes to run the queries in
>>>> a purely postgres way that only calls explain analyze once.
>>>>
>>>> SET search_path TO ag_catalog;
>>>> LOAD 'age';
>>>> SELECT create_graph('test_graph');
>>>>
>>>> CREATE OR REPLACE FUNCTION f_test(ct int, sql text)
>>>> RETURNS VOID
>>>> LANGUAGE plpgsql AS $func$
>>>>    DECLARE   i int;
>>>> BEGIN
>>>>     FOR i IN 1 .. $1 LOOP
>>>>         EXECUTE sql;
>>>>     END LOOP;
>>>> END $func$;
>>>>
>>>> EXPLAIN ANALYZE SELECT *
>>>> FROM f_test( 10000, $q$
>>>>     SELECT * from ag_catalog.cypher('test_graph', $$
>>>>         CREATE (a:Person {id: 129, name: 'Brian', title: 'Researcher',
>>>> city: 'Munich'})
>>>>         RETURN a
>>>>     $$) as (a agtype)
>>>> $q$);
>>>>
>>>> We create value for our clients by connecting the world's data.
>>>> Josh Innis Software Engineer
>>>> Core R&D Team
>>>> *P*: (831)278-0327
>>>> 3945 Freedom Circle #260, Santa Clara, CA 95054
>>>> <https://www.google.com/maps/place/3945+Freedom+Cir+%23260,+Santa+Clara,+CA+95054/>
>>>> bitnine.net
>>>>
>>>>
>>>> On Fri, Sep 10, 2021 at 9:12 AM John Gemignani <
>>>> john.gemign...@bitnine.net> wrote:
>>>>
>>>>> Good job!
>>>>>
>>>>> On Thu, Sep 9, 2021 at 9:11 PM VUONG QUOC Viet <vqv...@bitnine.net>
>>>>> wrote:
>>>>>
>>>>>> Hi Joe and Muhammad,
>>>>>>
>>>>>> I've finished the benchmarking experiments after listening to
>>>>>> Muhammad's suggestions to improve the process: 1. to create 2 separate
>>>>>> repos for original code and optimized code and 2. place the create_graph 
>>>>>> at
>>>>>> the beginning and drop_graph at the end of each sql file.
>>>>>>
>>>>>> The experiments were conducted basically the same way: I calculated
>>>>>> the total time for running 100, 1000 and 10000 sql queries, repeated 20
>>>>>> times consecutively, got the average value, and repeated that process 10
>>>>>> times more. So in essence, there are totally 200 measurements conducted 
>>>>>> for
>>>>>> original and optimized code, each. And I calculated overall average for
>>>>>> total 200 measurements to get a final number to compare between the
>>>>>> original and the optimized: So here's the result:
>>>>>>
>>>>>> Original code (ms) Optimized code (ms) Difference(ms)
>>>>>> 100 queries 29.23397 28.786665 0.447305
>>>>>> 1000 queries 255.391305 250.43654 4.95476500000004
>>>>>> 10000 queries 2558.612315 2517.9374 40.6749150000001
>>>>>>
>>>>>> According to the result, the difference in execution time for 100
>>>>>> queries is that the optimized code was about 0.45ms faster than the
>>>>>> original code. For 1000 queries, the optimized code is 4.95ms faster and
>>>>>> for 10000 queries, it's 40.67ms faster. These numbers, I think, are
>>>>>> reasonable, because as we make the number of queries 10 times larger, the
>>>>>> difference in time also increases around 10 times.
>>>>>>
>>>>>> I've attached the detailed statistics report. Please let me know your
>>>>>> comments and suggestions.
>>>>>>
>>>>>> Best regards,
>>>>>> Viet.
>>>>>>
>>>>>> --
>>>
>>> <http://bitnine.net>
>>>
>>> Joe Fagan
>>>
>>> Community Advocate Director
>>>
>>>
>>>
>>>  M : +44 7788 148772
>>>
>>>
>>>
>>> <https://www.facebook.com/search/top?q=bitnine%20agens>
>>> <http://linkedin.com/company/bitnineglobal> <http://bitnine.net>
>>> <http://age.apache.org>
>>> A Graph Extension for PostgreSQL <http://age.apache.org>In the news: Bitnine
>>> Promoting a First IPO in the Graph Database <https://bwnews.pr/3ds2mUc>
>>>
>>

Re: Benchmarking result

Reply via email to