Hi Lukasz and Ankur,

Thank you so much for your response! This is what we're doing/implementing
in our internal fork right now:

   1. We assume that the Java process and Python process *are always
   colocated in the same host*, so first of all we use "loopback" address
   instead of "any address" that's currently being used on the java side. That
   way, the traffic between sdk worker and runner is limited to the host but
   not exposed to network.
   2. Because of the multi-tenant nature of our environment, we still want
   to have authentication even for local host, so that data ports are not
   connected by random processes. Because different jobs have their own user
   name, it's sufficient to *use file system to store an ad-hoc secret*,
   which can be shared by both Python sdk and java runner. The the runner uses
   this secret to authenticate the worker (by using gRPC's interceptor for
   this customized auth)
   3. By having the 2 steps above, we *no longer need transport layer
   security *(SSL/TLS). So we abandon our initial plan to enable SSL/TLS.

Above is the high level plan that I'm implementing. I would like to have a
similar solution in the open source to be merged with our internal fork.
Let me know what you think. If this sounds OK I will create a ticket for
myself and will first send out a short write-up in google doc to collect
comments soon.

Thanks,
Hai

On Fri, Apr 26, 2019 at 5:24 PM Ankur Goenka <goe...@google.com> wrote:

> In an offline chat with Hai, It seem useful for users to be able to
> provide custom authentication like a secret which can be distributed out of
> band by the infrastructure and can be provided via file system, rpc to
> another service etc.
> gRPC already has some mechanism for standard and custom authentication[1].
> Instrumenting gRPC channel using command line option or environment
> variable on the worker machines can be be useful.
>
> [1] https://grpc.io/docs/guides/auth/
>
> On Fri, Apr 26, 2019 at 4:33 PM Lukasz Cwik <lc...@google.com> wrote:
>
>> The link to the ApiServiceDescriptor is
>> https://github.com/apache/beam/blob/476e17ed6badd4d5c06c4caf8a824805f40a8e7a/model/pipeline/src/main/proto/endpoints.proto#L31
>>
>> On Fri, Apr 26, 2019 at 4:32 PM Lukasz Cwik <lc...@google.com> wrote:
>>
>>> I had originally taken a look at this a while ago but not much has
>>> progressed since then. The original idea was that the ApiServiceDescriptor
>>> would be extended to support secure ways of authentication/communication. I
>>> was prototyping with an OAuth2 client credentials grant at the time but
>>> dropped it as other things were more important. The only currently
>>> supported mode across all SDKs is an implicit authenticated/secure mode
>>> where all communication is assumed to already be encrypted/private (e.g.
>>> over VPN that is managed externally with trusted services) and hence the
>>> gRPC channel itself is insecure and there is no authentication being
>>> performed.
>>>
>>> Even though sdk_worker.py seems like it supports credentials, no one
>>> invokes the constructor with credentials enabled as can be seen by this
>>> comment by Robert[1].
>>>
>>> For SSL/TLS support it seems like we need some way to configure a runner
>>> to be told to use SSL/TLS (potentially with a custom private key and trust
>>> chain). Do you have some suggestions on how we add support for passing
>>> around channel/call[2] credentials?
>>>
>>> 1:
>>> https://github.com/apache/beam/blob/476e17ed6badd4d5c06c4caf8a824805f40a8e7a/sdks/python/apache_beam/runners/worker/sdk_worker_main.py#L139
>>> 2: https://grpc.io/docs/guides/auth/
>>>
>>> On Tue, Apr 23, 2019 at 5:06 PM Hai Lu <lhai...@apache.org> wrote:
>>>
>>>> Hi,
>>>>
>>>> This is Hai from LinkedIn. Daniel and I have been working on
>>>> productionizing Samza portable runner. BTW, Daniel didn't mention in his
>>>> previous email that he has enabled and validated Python 3 for Samza runner
>>>> and it worked smoothly. Kudos to the team!
>>>>
>>>> Here I have a few security related questions about portability. At
>>>> LinkedIn, we enable SSL/TLS and ACLs for Kafka data and any data exchange.
>>>> In the case of portable runner, we're required to secure the data channels
>>>> between Java and Python processes as well because our Samza jobs are
>>>> running in a multi-tenant environment. While I'm currently working on this
>>>> on our internal branch, I do want to keep it clean and consistent with the
>>>> master branch.
>>>>
>>>> My questions are: were there any plans/thoughts around security for
>>>> portability? I see that sdk_worker.py does have some codes to create
>>>> secured gRPC channels; is anyone actually leveraging those codes? I don't
>>>> see on the Java side any work is done, though.
>>>>
>>>> Thanks,
>>>> Hai Lu
>>>>
>>>

Reply via email to