Hi All,

One big emerging enterprise use case coming up as more people consolidate
Data Lakehouses and Catalogs is something commonly known as "Data Sharing",
an more specifically over the course of adoption of open table formats
"Open Data Sharing".

Examples of existing managed service providers' Data Sharing features:

https://www.databricks.com/product/delta-sharing
https://docs.snowflake.com/en/user-guide/data-sharing-intro
https://docs.aws.amazon.com/redshift/latest/dg/datashare-overview.html
https://docs.cloud.google.com/bigquery/docs/analytics-hub-introduction
https://learn.microsoft.com/en-us/fabric/governance/external-data-sharing-overview

The basic idea is that when you share data between different companies, you
need a first-class governance/management layer and extra bells-and-whistles
that are distinct from just the basic capabilities of RBAC or generalized
access-control (i.e. if you're sharing across partially-untrusted org
boundaries, you don't just let the consumer organization log into your
datalake like one of your own employees).

JB and I put together this high-level proposal for supporting Open Sharing
in Polaris:

https://docs.google.com/document/d/1Y0yQi5iWbmuTHPkFiIs7WjIiC3EXJTl1PzZ-wtoRnZ0/edit?usp=sharing

Tentatively, it means adding ~5 logical data model constructs, some of
which may be a first-class PolarisEntity type, others subtypes of existing
entities, and others just a nested construct:

   - ShareEntity (would behave similarly to a Catalog)
   - ExternalConsumer (mostly inherits from Principal)
   - Listing (Similar to a "role grant" but has different metadata)
   - EndpointConfig (nested config under Listing)
   - ShareMembership (Similar to a "securable grant" but different metadata)

Feedback/comments welcome! I'll also bring it up for live discussion if
there's time in the community sync.

Cheers,
Dennis

Reply via email to