I pushed changes to move the provider into the “apache” directory. After
updating the class references across the project, I re-tested and all tests
passed.

Regarding the use of Gremlin (or another graph query language like Cypher
and SPARQL) for a common package approach, here are my thoughts on the pros
and cons:

pros (I can see only one):

   - Gremlin has been widely adopted by different cloud vendors (e.g. Azure
   Cosmos DB with Apache Gremlin and AWS Neptune) as well as in self-hosted
   environments.

cons:

   - Gremlin, Cypher (native for Neo4j) and SPARQL each have their own
   drivers for executing queries.
   - To achieve a common abstraction, a wrapper around each driver would be
   required. Each driver has its own connection parameters, underlying
   protocols, and may need method overrides for compatibility with different
   Python versions.
   - Not all vendors support every query language; for instance, Gremlin
   for Neo4j has been deprecated in recent releases, while Cosmos DB does not
   support Cypher or SPARQL.

While it would be ideal to have a unified graph query language and driver
that works seamlessly across different vendors, such a solution does not
exist at the moment. In my opinion, implementing provider-specific
solutions for each query language (Gremlin, Cypher, SPARQL) is more
realistic and practical given the current landscape.

Happy to discuss further or answer any questions!

Farhan

On Mon, Feb 24, 2025 at 11:33 AM Ahmad Farhan <ahmad.farhan9...@gmail.com>
wrote:

> I have worked with two different graph database vendors—Azure Cosmos DB
> and Neo4j. During our migration to Neo4j, we discovered that using the
> Gremlin language wasn’t possible; we were forced to rewrite all our queries
> into Cypher, which is the native language for Neo4j and, in my experience,
> much simpler for querying.
>
> This situation highlights a key challenge for a common abstraction: the
> underlying query languages and connection/authentication mechanisms vary
> significantly. Gremlin is not only different from Cypher in syntax but is
> also deprecated for Neo4j (see
> https://tinkerpop.apache.org/docs/3.7.3/reference/#neo4j-gremlin).
>
> The question would be how can the common approach accommodate these
> different query languages?
>
> On Fri, Feb 21, 2025 at 7:36 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>
>> Without deep looking at the code I love the idea - it's very similar to
>> what we have for common.sql and common.io - and soon common.messaging - I
>> also - long time ago - suggested common.dataframe that someone could
>> submit
>> using Apache Ibis:
>> https://lists.apache.org/thread/qx3yh6h0l6jb0kh3fz9q95b3x5b4001l  -
>> similarly I believe there was an idea about common.llm ...
>>
>> I think the "common" pattern is a great one for Airflow, to build on top
>> of
>> "other giants" who build those common abstractions that you can easily
>> switch between different implementations of various data access layers.
>>
>> My suggestion and question - would be however (not very strong on it, I
>> would love to hear what others think, I know it's been somewhat
>> contentious
>> when I started the ibis discussion) - would be to make it "common.graph",
>> "common.dataframe" - instead of "apache.gremlin" or "apache.ibis" - just
>> to
>> stress that those are not implementations of particular service but
>> opinionated choice of particular technology to do "common" operations.
>> This
>> is what essentially "common.io" is . - it should be named "fsspec"
>> provider
>> if we were to name it by the "library" that implemented it.
>>
>> J.
>>
>>
>> On Fri, Feb 21, 2025 at 8:22 PM Ahmad Farhan <ahmad.farhan9...@gmail.com>
>> wrote:
>>
>> > Hi Everyone,
>> >
>> > I’ve created a draft PR (https://github.com/apache/airflow/pull/46977)
>> to
>> > introduce and discuss a new provider for using Gremlin—the graph
>> traversal
>> > language of Apache TinkerPop (more details here:
>> > https://tinkerpop.apache.org/gremlin.html). Gremlin is supported by
>> > various
>> > graph database vendors such as Azure Cosmos DB and Amazon Neptune.
>> > Previously, I had to develop a custom hook to query data from Azure
>> Cosmos
>> > DB using Apache Gremlin.
>> >
>> > I managed to create a provider and run it locally on the main branch.
>> > However, I ran into the BaseHook issue (
>> > https://github.com/apache/airflow/issues/45233) on that branch, so I
>> ended
>> > up testing it fully on the v2-10-test branch. The PR should be complete,
>> > but I’ve kept it as a draft for now while we discuss the provider.
>> >
>> > I’m a new contributor, so I’m especially eager to hear your feedback.
>> > Comments on the PR is very welcome, and please feel free to reach out
>> with
>> > any questions via email or Slack.
>> >
>> > Thanks,
>> > Ahmad Farhan
>> >
>>
>

Reply via email to