Hi Holden, Thanks again for the detailed comments and suggestions.
I’ve responded inline in the document and will revise the SPIP to make several areas more explicit. For visibility, here is a short summary: 1) Security (new IPC mechanism) We will add a dedicated security section. Overall, this should not be worse than the current socket-based implementation. Moving to gRPC may actually improve our position by leveraging existing ecosystem support for TLS, authentication, interceptors, and observability — which are harder to standardize correctly on top of a raw socket protocol. 2) Performance assumptions Agreed — we should back claims with systematic benchmarking. We have an early gRPC prototype with preliminary results comparable to the current socket path, but we will avoid strong claims until properly benchmarked. The existing Python/Scala paths will remain, and any default switch would only happen after meeting explicit performance goals. 3) Fallback / migration strategy We will make this explicit in the SPIP. The plan is to separate the transport layer from UDF processing logic in worker.py, allowing gRPC and socket to share the same execution logic. This enables safe fallback and reduces long-term dual-maintenance overhead. 4) Worker specification We do have a more detailed design and can publish it as a supporting document. The SPIP will clarify the expected structure and required metadata without going too deep into implementation detail. 5) Dependency management This will be defined in the worker specification. Each language implementation defines its dependency requirements, and clusters are expected to provision environments accordingly (as is already the case for Python today). 6) Unified query planning concerns The intent is not to force identical planning behavior across languages. The worker specification can expose metadata (e.g., pipelining support, concurrency, memory characteristics, data format constraints), allowing the planner to remain flexible and language-aware without hardcoding per-language rules. 7) Inter-UDF pipelining Pipelining is supported by the protocol design (similar to PySpark). The init message can declare multiple UDFs and define chaining and input mappings. Whether a language supports this can be expressed in worker metadata so planning can respect it. Hopefully this addresses the main concerns. I’ll update the SPIP to reflect these clarifications more explicitly. Thanks again for the thoughtful review.
