Re: Creating Nodes in Bulk,

Josh Innis Wed, 01 Sep 2021 20:48:51 -0700

Hi all,

Upon thinking about this more I realized there is an underlying issue that
needs to be discussed that directly affects the approach needed on this
feature. To do so: I need to talk about an early design mistake that was
made on this project.

Back in 2019 when this project was still called Agensgraph-Extension and
there was no consideration to donate to Apache, the project ignored a basic
rule of programming: don't use deprecated features. In particular: the use
of OIDs by non pg_catalog tables. OID stands for Object Id. In short,
Postgres uses OIDs to assign a unique value to a row that no other system
table row has. In earlier versions of Postgres they gave the right of the
user to mark rows in other tables as "system" tables and give them an oid
(An oid that no POSTGRES system table had either).

This feature was deprecated in version Postgres 8.5. We started development
of AGE on Postgres 11. To try and summarize a decision that was made in the
first 3 months of a project. Some of the people involved (but not all) are
no longer working on the project thought: "The reason why they did this is
because they don't want systems built on top of Postgres to consider
themselves system tables. We are not technically a 'system built on top of
Postgres.' We are an extension of Postgres. Therefore our metadata tables
are technically system tables. Therefore our unique identifiers should be
unique in all of Postgres. Therefore we should use OIDs... Oh, it's
deprecated. We should still use them. Why not, what's the worst that could
happen?"

The worst that could happen is Postgres removes the functionality of adding
Oids to users tables (which we are) in Postgres 12. They removed the
functionality of 'WITH OIDS' from their SQL grammar implementation and they
removed Oid from being returned on certain Postgres internal C functions
that they used to give us. This is why AGE is incompatible with Postgres
12+ (NOTE: possibly other things too but this is definitely THE issue
causing us problems).

Our goal since that point has been to implement as much of the openCypher
spec as possible. Which is why we have not dealt with this problem yet. The
good news is: none of the features implemented since the point we realized
this mistake have made the job of changing AGE to fix this take more time.

The reason why I am bringing this up now in this email is: Option #1 is
clearly the superior option. We should funnel as much functionality that we
can through the ANSI SQL and the openCypher specification that we can. The
internal C functions of Postgres are not part of those standards.
Therefore, this bulk import tool needs to create dynamic sql queries on a
row per row basis, rather than use the internal Postgres functions to
bypass some of those standards. Even though it is slower.

With all of that said: we need to deal with this issue soon. In other
orders: soon == 2021.

TLDR: Option #1 is better.
We create value for our clients by connecting the world's data.
Josh Innis Software Engineer
Core R&D Team
*P*: (831)278-0327
3945 Freedom Circle #260, Santa Clara, CA 95054
<https://www.google.com/maps/place/3945+Freedom+Cir+%23260,+Santa+Clara,+CA+95054/>
bitnine.net

On Wed, Sep 1, 2021 at 10:16 AM Josh Innis <josh.in...@bitnine.net> wrote:

> Muhammed,
>
> Directly calling the functions that insert data into the vertex and edge
> tables might not be the best way. They are tightly integrated with the
> Postgres stages of execution (Parse, Transform, Optimize, and Execution).
> There are two options I would recommend:
>
>
>    1. Dynamically create either a Cypher CREATE query or create an SQL
>    insert query.
>    2. Take a look at how we insert data into our ag_catalog tables
>    ag_graph and ag_label. Both are located in src/backend/catalog. This would
>    decouple a great amount of our insertion logic from the Postgres (Though
>    obviously not completely). If you take a look at agtype_build_map in
>    src/backend/utils/adt/agtype.c that would give you an idea on how to create
>    the property map from the spreadsheet.
>
> Both solutions have potential drawbacks. Dynamically creating a cypher/sql
> query will be slower however, you are using the standard interface provided
> by both us and Postgres (via the SQL and openCypher language
> specifications). Which would not be the case in the 2nd option, however
> that option could decouple some logic that might be useful for later
> features.
>
> We create value for our clients by connecting the world's data.
> Josh Innis Software Engineer
> Core R&D Team
> *P*: (831)278-0327
> 3945 Freedom Circle #260, Santa Clara, CA 95054
> <https://www.google.com/maps/place/3945+Freedom+Cir+%23260,+Santa+Clara,+CA+95054/>
> bitnine.net
>
>
> On Wed, Sep 1, 2021 at 9:42 AM Muhammad Shoaib <sho...@bitnine.net> wrote:
>
>> Hi Folks,
>>
>> I am trying to write a function to load nodes from CSV into AGE. I have
>> data into C data structure but don't know how to proceed further. The
>> particular questions are
>>
>> Which AGE data structure I need to use
>> which function I need to call in order to insert data into PG Tables.
>>
>> Looking forward to hearing from you.
>>
>> Kind regards
>> Shoaib
>>
>

Re: Creating Nodes in Bulk,

Reply via email to