Re: Benchmarking result

Josh Innis Mon, 13 Sep 2021 11:21:55 -0700

Runtime performance of any query in SQL is dictated by the execution tree
structure. AGE translates cypher commands into SQL as much as possible.
Some exceptions are: variable length edges, the merge clause, and the
CREATE/REMOVE/SET/DELETE clauses. (Sidenote: we do try to use Postgres
functionality in the exceptions, however by definition, they must deviate
from functionality allowed in the SQL standard in a notable way). The patch
that we are benchmarking here does not affect the execution tree. Instead
what it does is affect the time it takes to create the vertex agtype after
Postgres has scanned the table the vertex is stored in and given the rest
of the execution tree a tuple.


A vertex is a map with very specific keys (id, label, and properties). This
patch optimizes constructing that data type, because Agtype maps are
ordered. The current version of agtype_build_vertex is anticipating to
receive it's keys in an unordered way, which is unnecessary. This patch
will bypass that check.

There is no way to make this patch's performance statistically significant.
We have reduced the execution tree to almost as little as possible. There
is a way to simplify it more, which I have added below. However, it is not
necessary. What we are seeing is exactly what I personally expected: a
small but not significant improvement. We can continue to work on this, but
we are improving our performance benchmarking tools at this point.

SQL:

SET search_path TO ag_catalog;
LOAD 'age';
SELECT create_graph('test_graph');

SELECT *
FROM f_test( 1000000, $q$
    SELECT * from ag_catalog.cypher('test_graph', $$
        CREATE (a:Person {id: 129, name: 'Brian', title: 'Researcher',
city: 'Munich'})
        RETURN a
    $$) as (a agtype)
$q$);

EXPLAIN ANALYZE SELECT _agtype_build_vertex(P.id, 'Person'::cstring,
P.properties) from test_graph."Person" as P;

I think this isolates the test to exactly what is needed and nothing more.
Given the other performance benchmarks, I anticipate about a 2% average
improvement to runtime.

There are some changes that Viet needs to make to the patch in the Slack
channel (https://app.slack.com/client/TAET0CD3L/C0102328XEJ). I will add
instructions there shortly and a tutorial on the 'amend' option for the
commit command in git. If anyone reading this does not have access to the
Slack channel, create an email in the dev mailing list and we will include
you.

Note: Also f_test needs to get renamed if we wanna add it to the repository
in any way. Given the functionality, a verbose name would be:
run_dynamic_sql_n_times

We create value for our clients by connecting the world's data.
Josh Innis Software Engineer
Core R&D Team
*P*: (831)278-0327
3945 Freedom Circle #260, Santa Clara, CA 95054
<https://www.google.com/maps/place/3945+Freedom+Cir+%23260,+Santa+Clara,+CA+95054/>
bitnine.net


On Mon, Sep 13, 2021 at 7:34 AM Joe Fagan <joe.fa...@bitnine.net> wrote:

> [image: image.png]
>
> Viet,
>
> For the results of last week's test unfortunately the optimised code is
> not statistically significantly different from the non-optimised.
>
> When you send the new test results I can run it again (should take only a
> few minutes now I have it set up) and re-run the test. I can also explain
> the results if you're interested.
>
> Briefly the test is asking whether the before and after numbers could have
> come from the same code (ie no changes). The results are saying that yes,
> they could. Statistically the difference in results could (very) easily
> have come from the same code. There is no significant difference. In other
> words, you could have used only the before code, ran the test twice, and
> the difference in results could easily be greater than the difference you
> have observed in your test.
>
> Thanks
> Joe
>
> On Fri, 10 Sept 2021 at 19:57, Joe Fagan <joe.fa...@bitnine.net> wrote:
>
>> Great Josh.
>> I learnt loads reading your script
>>
>> On Fri, 10 Sep 2021 at 18:14, Josh Innis <josh.in...@bitnine.net> wrote:
>>
>>> Good Job Viet. I made some modifications to the SQL function that you
>>> made. This may be a way to analyze how long it takes to run the queries in
>>> a purely postgres way that only calls explain analyze once.
>>>
>>> SET search_path TO ag_catalog;
>>> LOAD 'age';
>>> SELECT create_graph('test_graph');
>>>
>>> CREATE OR REPLACE FUNCTION f_test(ct int, sql text)
>>> RETURNS VOID
>>> LANGUAGE plpgsql AS $func$
>>>    DECLARE   i int;
>>> BEGIN
>>>     FOR i IN 1 .. $1 LOOP
>>>         EXECUTE sql;
>>>     END LOOP;
>>> END $func$;
>>>
>>> EXPLAIN ANALYZE SELECT *
>>> FROM f_test( 10000, $q$
>>>     SELECT * from ag_catalog.cypher('test_graph', $$
>>>         CREATE (a:Person {id: 129, name: 'Brian', title: 'Researcher',
>>> city: 'Munich'})
>>>         RETURN a
>>>     $$) as (a agtype)
>>> $q$);
>>>
>>> We create value for our clients by connecting the world's data.
>>> Josh Innis Software Engineer
>>> Core R&D Team
>>> *P*: (831)278-0327
>>> 3945 Freedom Circle #260, Santa Clara, CA 95054
>>> <https://www.google.com/maps/place/3945+Freedom+Cir+%23260,+Santa+Clara,+CA+95054/>
>>> bitnine.net
>>>
>>>
>>> On Fri, Sep 10, 2021 at 9:12 AM John Gemignani <
>>> john.gemign...@bitnine.net> wrote:
>>>
>>>> Good job!
>>>>
>>>> On Thu, Sep 9, 2021 at 9:11 PM VUONG QUOC Viet <vqv...@bitnine.net>
>>>> wrote:
>>>>
>>>>> Hi Joe and Muhammad,
>>>>>
>>>>> I've finished the benchmarking experiments after listening to
>>>>> Muhammad's suggestions to improve the process: 1. to create 2 separate
>>>>> repos for original code and optimized code and 2. place the create_graph 
>>>>> at
>>>>> the beginning and drop_graph at the end of each sql file.
>>>>>
>>>>> The experiments were conducted basically the same way: I calculated
>>>>> the total time for running 100, 1000 and 10000 sql queries, repeated 20
>>>>> times consecutively, got the average value, and repeated that process 10
>>>>> times more. So in essence, there are totally 200 measurements conducted 
>>>>> for
>>>>> original and optimized code, each. And I calculated overall average for
>>>>> total 200 measurements to get a final number to compare between the
>>>>> original and the optimized: So here's the result:
>>>>>
>>>>> Original code (ms) Optimized code (ms) Difference(ms)
>>>>> 100 queries 29.23397 28.786665 0.447305
>>>>> 1000 queries 255.391305 250.43654 4.95476500000004
>>>>> 10000 queries 2558.612315 2517.9374 40.6749150000001
>>>>>
>>>>> According to the result, the difference in execution time for 100
>>>>> queries is that the optimized code was about 0.45ms faster than the
>>>>> original code. For 1000 queries, the optimized code is 4.95ms faster and
>>>>> for 10000 queries, it's 40.67ms faster. These numbers, I think, are
>>>>> reasonable, because as we make the number of queries 10 times larger, the
>>>>> difference in time also increases around 10 times.
>>>>>
>>>>> I've attached the detailed statistics report. Please let me know your
>>>>> comments and suggestions.
>>>>>
>>>>> Best regards,
>>>>> Viet.
>>>>>
>>>>> --
>>
>> <http://bitnine.net>
>>
>> Joe Fagan
>>
>> Community Advocate Director
>>
>>
>>
>>  M : +44 7788 148772
>>
>>
>>
>> <https://www.facebook.com/search/top?q=bitnine%20agens>
>> <http://linkedin.com/company/bitnineglobal> <http://bitnine.net>
>> <http://age.apache.org>
>> A Graph Extension for PostgreSQL <http://age.apache.org>In the news: Bitnine
>> Promoting a First IPO in the Graph Database <https://bwnews.pr/3ds2mUc>
>>
>

Re: Benchmarking result

Reply via email to