Thanks for the important clarification - reading up on UCX, it makes sense to 
implement a health check abstraction. A couple follow up musings:

1. "Hosting" vs "data plane" communication protocols. Clearly this isn't 
something worth bundling in right now, but it seems to me that health checks 
fall under a class of server behavior that is specific to the hosting 
environment as opposed to transporting data to the client efficiently. Going 
further, a simple non-GRPC HTTP health probe function would provide even more 
broad support for the server across multiple hosting platforms (eg, K8s without 
an experimental change) without changing any client facing semantics. Of 
course, this has added burden to the team maintaining the implementation, but 
something to consider/clarify as goals for a reference server implementation.

2. To make sure I understand the compatibility concern, I assume this is only 
for the case of applications that have already embedded a "sibling" health 
check RPC today due to the lack of built-in FSB implementation and not for 
existing FSB derivations that can simply use a built-in implementation on 
"upgrade", correct? Given the downsides of the existing workarounds 
(essentially the flight server can't easily be taken into consideration when 
deciding on OK response), I think it makes sense to have it opt-in within a 
minor version to ease transition, though I would think the next major version 
would flip the default. It's possible I didn't fully understand this 
complication, however, and I also don't feel very strongly (some hosting 
environments also have optional health checks, so perhaps it aligns well) due 
to being uninformed.

3. RE: not exposing the full breadth, did you mean that of the 2 messages in 
the gRPC proto (Check vs Watch) and 2 "branches" in the specified 
implementation (empty + specified "service name") there is a clear abstraction 
for users to customize the output/response for Check {service=""} vs the other 
forms? Or did you mean that the contract itself specified by grpc health check 
wouldn't be fulfilled by the server implementation? I think Check {service=""} 
is the most impactful by far certainly, but I'm not aware of downsides when 
partially implementing a check protocol like that (not sure what hosting 
environments might assume the presence of Watch, for example). If you meant 
consumers of FSB are only exposed to an abstraction that allows customizing 
Check (maybe subset) responses only, but the server implements the full grpc 
health protocol with trivial implementations for the rest, I think that makes 
total sense.

AK

-----Original Message-----
From: David Li <lidav...@apache.org> 
Sent: Monday, October 3, 2022 2:45 PM
To: dev@arrow.apache.org
Subject: [EXTERNAL] Re: [C++][Python]Built-in GRPC health checks in 
FlightServerBase

[You don't often get email from lidav...@apache.org. Learn why this is 
important at https://aka.ms/LearnAboutSenderIdentification ]

Hi Akshaya,

Thanks for chiming in. At first glance, I think this is a reasonable use case 
to cover. However, Flight does try to be independent of gRPC. So maybe the 
approach would be to define a "Flight health check" that would map to gRPC 
health check when using gRPC, and something else when not (for UCX, presumably 
a custom method). And the only other complication would be making this opt-in 
to avoid conflicting with existing applications.

That does mean we wouldn't be exposing the full breadth of the gRPC health 
check API. But Python would be covered and we could avoid having to redefine 
this all the time.

-David

On Mon, Oct 3, 2022, at 17:39, Akshaya Annavajhala (AK) wrote:
> Hello, I was helpfully redirected by David from [ARROW-17920] Built-in 
> GRPC health checks in FlightServerBase - ASF JIRA 
> (apache.org)<https://nam06.safelinks.protection.outlook.com/?url=https
> %3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FARROW-17920&amp;data=05%7
> C01%7Cakannava%40microsoft.com%7C198f81361dae42167d3b08daa588a73c%7C72
> f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638004303595147983%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=SJGSCFvKBOcfjC%2BBjzVQ%2F7XR4AcMjZ1MJKlcH7fP%2Bmg%3D&amp;reserved=0>
>  to the mailing list to discuss how a consumer of FlightServerBase might 
> meaningfully implement probe endpoints/health checks for production 
> environments more easily.
>
> The issue links to a related previous item that adds a C++ example for 
> running multiple GRPC servers behind the same 
> port<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2
> Fgithub.com%2Fapache%2Farrow%2Fpull%2F11524&amp;data=05%7C01%7Cakannav
> a%40microsoft.com%7C198f81361dae42167d3b08daa588a73c%7C72f988bf86f141a
> f91ab2d7cd011db47%7C1%7C0%7C638004303595147983%7CUnknown%7CTWFpbGZsb3d
> 8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C
> 3000%7C%7C%7C&amp;sdata=GXKLrLbhUNciFcgBptEbcbsxbpIhlq2iaWDjNdu1GAY%3D
> &amp;reserved=0> which I can translate simply into a side-by-side 
> health service and FSB in the same application. However, the 
> issue<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%
> 2Fissues.apache.org%2Fjira%2Fbrowse%2FARROW-14440&amp;data=05%7C01%7Ca
> kannava%40microsoft.com%7C198f81361dae42167d3b08daa588a73c%7C72f988bf8
> 6f141af91ab2d7cd011db47%7C1%7C0%7C638004303595147983%7CUnknown%7CTWFpb
> GZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0
> %3D%7C3000%7C%7C%7C&amp;sdata=%2Fz62q8KzcdQoNOf0zDNXM3gM%2BpLoTtffWwa4
> wlo1UYU%3D&amp;reserved=0> notes this same technique is unavailable to 
> python (implementations of
> FlightServerBase) and the separate servers implies a lack of 
> comingling logic related to the flight server itself and the health 
> check, effectively making the health probes very dumb logic/check box 
> implementation.
>
> In the issue I opened, I proposed built-in gRPC health check 
> protocol<https://nam06.safelinks.protection.outlook.com/?url=https%3A%
> 2F%2Fgithub.com%2Fgrpc%2Fgrpc%2Fblob%2Fmaster%2Fdoc%2Fhealth-checking.
> md%23grpc-health-checking-protocol&amp;data=05%7C01%7Cakannava%40micro
> soft.com%7C198f81361dae42167d3b08daa588a73c%7C72f988bf86f141af91ab2d7c
> d011db47%7C1%7C0%7C638004303595147983%7CUnknown%7CTWFpbGZsb3d8eyJWIjoi
> MC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7
> C%7C&amp;sdata=vkF3vUK6nijaT7xiV%2B5%2F%2BsY5mc5ZbbbmPEJ%2BzBElXjU%3D&
> amp;reserved=0> in the native FSB that would thus apply to a "simple" 
> python derivation without modifications as well as perhaps in the 
> future allowing the python space to override the implementation 
> (similar to do_*) for more advanced use cases that might require 
> extensive python space initialization, etc.
>
> I'd love to hear others' thoughts on the high-level 
> direction/alternatives on this topic!
>
> Regards,
> AK

Reply via email to