I have worked with two different graph database vendors—Azure Cosmos DB and Neo4j. During our migration to Neo4j, we discovered that using the Gremlin language wasn’t possible; we were forced to rewrite all our queries into Cypher, which is the native language for Neo4j and, in my experience, much simpler for querying.
This situation highlights a key challenge for a common abstraction: the underlying query languages and connection/authentication mechanisms vary significantly. Gremlin is not only different from Cypher in syntax but is also deprecated for Neo4j (see https://tinkerpop.apache.org/docs/3.7.3/reference/#neo4j-gremlin). The question would be how can the common approach accommodate these different query languages? On Fri, Feb 21, 2025 at 7:36 PM Jarek Potiuk <ja...@potiuk.com> wrote: > Without deep looking at the code I love the idea - it's very similar to > what we have for common.sql and common.io - and soon common.messaging - I > also - long time ago - suggested common.dataframe that someone could submit > using Apache Ibis: > https://lists.apache.org/thread/qx3yh6h0l6jb0kh3fz9q95b3x5b4001l - > similarly I believe there was an idea about common.llm ... > > I think the "common" pattern is a great one for Airflow, to build on top of > "other giants" who build those common abstractions that you can easily > switch between different implementations of various data access layers. > > My suggestion and question - would be however (not very strong on it, I > would love to hear what others think, I know it's been somewhat contentious > when I started the ibis discussion) - would be to make it "common.graph", > "common.dataframe" - instead of "apache.gremlin" or "apache.ibis" - just to > stress that those are not implementations of particular service but > opinionated choice of particular technology to do "common" operations. This > is what essentially "common.io" is . - it should be named "fsspec" > provider > if we were to name it by the "library" that implemented it. > > J. > > > On Fri, Feb 21, 2025 at 8:22 PM Ahmad Farhan <ahmad.farhan9...@gmail.com> > wrote: > > > Hi Everyone, > > > > I’ve created a draft PR (https://github.com/apache/airflow/pull/46977) > to > > introduce and discuss a new provider for using Gremlin—the graph > traversal > > language of Apache TinkerPop (more details here: > > https://tinkerpop.apache.org/gremlin.html). Gremlin is supported by > > various > > graph database vendors such as Azure Cosmos DB and Amazon Neptune. > > Previously, I had to develop a custom hook to query data from Azure > Cosmos > > DB using Apache Gremlin. > > > > I managed to create a provider and run it locally on the main branch. > > However, I ran into the BaseHook issue ( > > https://github.com/apache/airflow/issues/45233) on that branch, so I > ended > > up testing it fully on the v2-10-test branch. The PR should be complete, > > but I’ve kept it as a draft for now while we discuss the provider. > > > > I’m a new contributor, so I’m especially eager to hear your feedback. > > Comments on the PR is very welcome, and please feel free to reach out > with > > any questions via email or Slack. > > > > Thanks, > > Ahmad Farhan > > >