Feature Proposal: Outbound Compliance Control for Data Sinks in Apache Wayang

Mirko Kämpf Tue, 25 Mar 2025 02:35:25 -0700

Hey Wayang community,

We’d like to open a discussion on the topic of *compliance-aware execution*
in Apache Wayang—particularly addressing the concern of *preventing
unintended outbound data transfers* when executing jobs across hybrid or
multi-tenant clusters.



*1. Title*

*Outbound Traffic Validation & Compliance Optimizer Extension*


*2. Motivation*

The question came up from users:

*“How can we guarantee that Apache Wayang jobs do not export data outside
the secured boundary of the cluster?”*

Today, this guarantee is mostly achieved externally—through firewalls and
infrastructure-level fencing. However, for stricter compliance
environments, *enforcing such guarantees at the logical plan level* would
significantly increase trust and transparency in Wayang-based data
workflows.


*3. Proposal Summary*

We propose introducing an *Outbound Compliance Mode* to Wayang’s query
optimizer. In this mode, all sink operations would be validated against a
configurable set of *allowed target clusters or zones*. The validation
could occur during optimization and/or execution plan generation, and could
log or reject non-compliant plans.

This mechanism would ensure that Wayang jobs cannot accidentally or
intentionally route data to non-approved sinks.


*4. Detailed Description*

This feature consists of two layers:

*Level 1: Infrastructure Fencing (external)*

Outbound traffic is blocked by firewalls or network policies. This is
already widely used and provides basic protection.

*Level 2: Active Flow Control (in Wayang)*

An extension to the query optimizer could validate all *sink operators*
against a whitelist of approved destinations, possibly defined in
configuration or via rule sets (e.g., allowlist of URIs, target types, or
data zones).


We envision:

• A *Compliance Query Optimizer Extension* (activated optionally)

• Declarative rules to validate:

• Sink destination type (e.g., JDBC, S3, HDFS)

• Target host or region (e.g., EU-only)

• Sink configuration (e.g., encryption on/off)

• Rejection or logging of plans that violate compliance rules


Optionally, this could be extended to support:

• *CORS-like logic for data sinks*, where the sink declares allowed inbound
data zones

• *Smart contract-based approvals* for external writes, with enforced
logging or audit trails


This would provide enterprise-grade compliance guarantees *at the planning
layer*—beyond what firewalls alone can enforce.


*5. Alternatives Considered*

The standard approach today is *relying on network firewalls* and
infrastructure-level policies. However, these do not provide visibility or
explainability inside the Wayang job planning phase.

Another approach is *static code analysis*, but this would be outside
Wayang and harder to maintain.

*6. Next Steps / Call for Feedback*

We’re happy to draft a design proposal or implement a prototype if this
direction is of interest to the community.

We’d especially welcome input on:

• Where in the optimizer pipeline this logic should live

• Whether this aligns with existing security/privacy goals

• Integration with metadata or provenance tracking


Looking forward to your thoughts!


Best regards,

Mirko

-- 
Dr. Mirko Kämpf
*Gründer & Coach *
*maindset.ACADEMY*

Feature Proposal: Outbound Compliance Control for Data Sinks in Apache Wayang

Reply via email to