Yeah . `apache/gremlin" seems like a better option then. Does anyone have
anything against it? I think we are pretty happy with accepting "other
apache" projects as providers, so I see no issue with Gremlin - knowing
that we can always reach out to our friendly Apache Community in case of
any issues. So - unless we do not hear any "opposition" in a few days, I
think it would make sense if you start `[LAZY CONSENSUS]` thread -
without a need for `[VOTE]` thread.

One thing though that I would love to have - is to also have an integration
test if possible (we had it with apache.kafka for example) - those are
tests that could run **some** graphdb database locally (via docker-compose)
and run a very rudimentary checks against a "real" database, not a mocked
call. That would make it more robust.

More about integration tests, how to build, run, test them and integrate
them in our CI can be found here:
https://github.com/apache/airflow/blob/main/contributing-docs/testing/integration_tests.rst
- happy to help if you are stuck with it.

J.


On Wed, Feb 26, 2025 at 1:25 PM Ahmad Farhan <ahmad.farhan9...@gmail.com>
wrote:

> I pushed changes to move the provider into the “apache” directory. After
> updating the class references across the project, I re-tested and all tests
> passed.
>
> Regarding the use of Gremlin (or another graph query language like Cypher
> and SPARQL) for a common package approach, here are my thoughts on the pros
> and cons:
>
> pros (I can see only one):
>
>    - Gremlin has been widely adopted by different cloud vendors (e.g. Azure
>    Cosmos DB with Apache Gremlin and AWS Neptune) as well as in self-hosted
>    environments.
>
> cons:
>
>    - Gremlin, Cypher (native for Neo4j) and SPARQL each have their own
>    drivers for executing queries.
>    - To achieve a common abstraction, a wrapper around each driver would be
>    required. Each driver has its own connection parameters, underlying
>    protocols, and may need method overrides for compatibility with
> different
>    Python versions.
>    - Not all vendors support every query language; for instance, Gremlin
>    for Neo4j has been deprecated in recent releases, while Cosmos DB does
> not
>    support Cypher or SPARQL.
>
> While it would be ideal to have a unified graph query language and driver
> that works seamlessly across different vendors, such a solution does not
> exist at the moment. In my opinion, implementing provider-specific
> solutions for each query language (Gremlin, Cypher, SPARQL) is more
> realistic and practical given the current landscape.
>
> Happy to discuss further or answer any questions!
>
> Farhan
>
> On Mon, Feb 24, 2025 at 11:33 AM Ahmad Farhan <ahmad.farhan9...@gmail.com>
> wrote:
>
> > I have worked with two different graph database vendors—Azure Cosmos DB
> > and Neo4j. During our migration to Neo4j, we discovered that using the
> > Gremlin language wasn’t possible; we were forced to rewrite all our
> queries
> > into Cypher, which is the native language for Neo4j and, in my
> experience,
> > much simpler for querying.
> >
> > This situation highlights a key challenge for a common abstraction: the
> > underlying query languages and connection/authentication mechanisms vary
> > significantly. Gremlin is not only different from Cypher in syntax but is
> > also deprecated for Neo4j (see
> > https://tinkerpop.apache.org/docs/3.7.3/reference/#neo4j-gremlin).
> >
> > The question would be how can the common approach accommodate these
> > different query languages?
> >
> > On Fri, Feb 21, 2025 at 7:36 PM Jarek Potiuk <ja...@potiuk.com> wrote:
> >
> >> Without deep looking at the code I love the idea - it's very similar to
> >> what we have for common.sql and common.io - and soon common.messaging
> - I
> >> also - long time ago - suggested common.dataframe that someone could
> >> submit
> >> using Apache Ibis:
> >> https://lists.apache.org/thread/qx3yh6h0l6jb0kh3fz9q95b3x5b4001l  -
> >> similarly I believe there was an idea about common.llm ...
> >>
> >> I think the "common" pattern is a great one for Airflow, to build on top
> >> of
> >> "other giants" who build those common abstractions that you can easily
> >> switch between different implementations of various data access layers.
> >>
> >> My suggestion and question - would be however (not very strong on it, I
> >> would love to hear what others think, I know it's been somewhat
> >> contentious
> >> when I started the ibis discussion) - would be to make it
> "common.graph",
> >> "common.dataframe" - instead of "apache.gremlin" or "apache.ibis" - just
> >> to
> >> stress that those are not implementations of particular service but
> >> opinionated choice of particular technology to do "common" operations.
> >> This
> >> is what essentially "common.io" is . - it should be named "fsspec"
> >> provider
> >> if we were to name it by the "library" that implemented it.
> >>
> >> J.
> >>
> >>
> >> On Fri, Feb 21, 2025 at 8:22 PM Ahmad Farhan <
> ahmad.farhan9...@gmail.com>
> >> wrote:
> >>
> >> > Hi Everyone,
> >> >
> >> > I’ve created a draft PR (https://github.com/apache/airflow/pull/46977
> )
> >> to
> >> > introduce and discuss a new provider for using Gremlin—the graph
> >> traversal
> >> > language of Apache TinkerPop (more details here:
> >> > https://tinkerpop.apache.org/gremlin.html). Gremlin is supported by
> >> > various
> >> > graph database vendors such as Azure Cosmos DB and Amazon Neptune.
> >> > Previously, I had to develop a custom hook to query data from Azure
> >> Cosmos
> >> > DB using Apache Gremlin.
> >> >
> >> > I managed to create a provider and run it locally on the main branch.
> >> > However, I ran into the BaseHook issue (
> >> > https://github.com/apache/airflow/issues/45233) on that branch, so I
> >> ended
> >> > up testing it fully on the v2-10-test branch. The PR should be
> complete,
> >> > but I’ve kept it as a draft for now while we discuss the provider.
> >> >
> >> > I’m a new contributor, so I’m especially eager to hear your feedback.
> >> > Comments on the PR is very welcome, and please feel free to reach out
> >> with
> >> > any questions via email or Slack.
> >> >
> >> > Thanks,
> >> > Ahmad Farhan
> >> >
> >>
> >
>

Reply via email to