Hey Wayang community, We’d like to open a discussion on the topic of *compliance-aware execution* in Apache Wayang—particularly addressing the concern of *preventing unintended outbound data transfers* when executing jobs across hybrid or multi-tenant clusters.
*1. Title* *Outbound Traffic Validation & Compliance Optimizer Extension* *2. Motivation* The question came up from users: *“How can we guarantee that Apache Wayang jobs do not export data outside the secured boundary of the cluster?”* Today, this guarantee is mostly achieved externally—through firewalls and infrastructure-level fencing. However, for stricter compliance environments, *enforcing such guarantees at the logical plan level* would significantly increase trust and transparency in Wayang-based data workflows. *3. Proposal Summary* We propose introducing an *Outbound Compliance Mode* to Wayang’s query optimizer. In this mode, all sink operations would be validated against a configurable set of *allowed target clusters or zones*. The validation could occur during optimization and/or execution plan generation, and could log or reject non-compliant plans. This mechanism would ensure that Wayang jobs cannot accidentally or intentionally route data to non-approved sinks. *4. Detailed Description* This feature consists of two layers: *Level 1: Infrastructure Fencing (external)* Outbound traffic is blocked by firewalls or network policies. This is already widely used and provides basic protection. *Level 2: Active Flow Control (in Wayang)* An extension to the query optimizer could validate all *sink operators* against a whitelist of approved destinations, possibly defined in configuration or via rule sets (e.g., allowlist of URIs, target types, or data zones). We envision: • A *Compliance Query Optimizer Extension* (activated optionally) • Declarative rules to validate: • Sink destination type (e.g., JDBC, S3, HDFS) • Target host or region (e.g., EU-only) • Sink configuration (e.g., encryption on/off) • Rejection or logging of plans that violate compliance rules Optionally, this could be extended to support: • *CORS-like logic for data sinks*, where the sink declares allowed inbound data zones • *Smart contract-based approvals* for external writes, with enforced logging or audit trails This would provide enterprise-grade compliance guarantees *at the planning layer*—beyond what firewalls alone can enforce. *5. Alternatives Considered* The standard approach today is *relying on network firewalls* and infrastructure-level policies. However, these do not provide visibility or explainability inside the Wayang job planning phase. Another approach is *static code analysis*, but this would be outside Wayang and harder to maintain. *6. Next Steps / Call for Feedback* We’re happy to draft a design proposal or implement a prototype if this direction is of interest to the community. We’d especially welcome input on: • Where in the optimizer pipeline this logic should live • Whether this aligns with existing security/privacy goals • Integration with metadata or provenance tracking Looking forward to your thoughts! Best regards, Mirko -- Dr. Mirko Kämpf *Gründer & Coach * *maindset.ACADEMY*