Thanks for the important clarification - reading up on UCX, it makes sense to implement a health check abstraction. A couple follow up musings:
1. "Hosting" vs "data plane" communication protocols. Clearly this isn't something worth bundling in right now, but it seems to me that health checks fall under a class of server behavior that is specific to the hosting environment as opposed to transporting data to the client efficiently. Going further, a simple non-GRPC HTTP health probe function would provide even more broad support for the server across multiple hosting platforms (eg, K8s without an experimental change) without changing any client facing semantics. Of course, this has added burden to the team maintaining the implementation, but something to consider/clarify as goals for a reference server implementation. 2. To make sure I understand the compatibility concern, I assume this is only for the case of applications that have already embedded a "sibling" health check RPC today due to the lack of built-in FSB implementation and not for existing FSB derivations that can simply use a built-in implementation on "upgrade", correct? Given the downsides of the existing workarounds (essentially the flight server can't easily be taken into consideration when deciding on OK response), I think it makes sense to have it opt-in within a minor version to ease transition, though I would think the next major version would flip the default. It's possible I didn't fully understand this complication, however, and I also don't feel very strongly (some hosting environments also have optional health checks, so perhaps it aligns well) due to being uninformed. 3. RE: not exposing the full breadth, did you mean that of the 2 messages in the gRPC proto (Check vs Watch) and 2 "branches" in the specified implementation (empty + specified "service name") there is a clear abstraction for users to customize the output/response for Check {service=""} vs the other forms? Or did you mean that the contract itself specified by grpc health check wouldn't be fulfilled by the server implementation? I think Check {service=""} is the most impactful by far certainly, but I'm not aware of downsides when partially implementing a check protocol like that (not sure what hosting environments might assume the presence of Watch, for example). If you meant consumers of FSB are only exposed to an abstraction that allows customizing Check (maybe subset) responses only, but the server implements the full grpc health protocol with trivial implementations for the rest, I think that makes total sense. AK -----Original Message----- From: David Li <lidav...@apache.org> Sent: Monday, October 3, 2022 2:45 PM To: dev@arrow.apache.org Subject: [EXTERNAL] Re: [C++][Python]Built-in GRPC health checks in FlightServerBase [You don't often get email from lidav...@apache.org. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] Hi Akshaya, Thanks for chiming in. At first glance, I think this is a reasonable use case to cover. However, Flight does try to be independent of gRPC. So maybe the approach would be to define a "Flight health check" that would map to gRPC health check when using gRPC, and something else when not (for UCX, presumably a custom method). And the only other complication would be making this opt-in to avoid conflicting with existing applications. That does mean we wouldn't be exposing the full breadth of the gRPC health check API. But Python would be covered and we could avoid having to redefine this all the time. -David On Mon, Oct 3, 2022, at 17:39, Akshaya Annavajhala (AK) wrote: > Hello, I was helpfully redirected by David from [ARROW-17920] Built-in > GRPC health checks in FlightServerBase - ASF JIRA > (apache.org)<https://nam06.safelinks.protection.outlook.com/?url=https > %3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FARROW-17920&data=05%7 > C01%7Cakannava%40microsoft.com%7C198f81361dae42167d3b08daa588a73c%7C72 > f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638004303595147983%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=SJGSCFvKBOcfjC%2BBjzVQ%2F7XR4AcMjZ1MJKlcH7fP%2Bmg%3D&reserved=0> > to the mailing list to discuss how a consumer of FlightServerBase might > meaningfully implement probe endpoints/health checks for production > environments more easily. > > The issue links to a related previous item that adds a C++ example for > running multiple GRPC servers behind the same > port<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2 > Fgithub.com%2Fapache%2Farrow%2Fpull%2F11524&data=05%7C01%7Cakannav > a%40microsoft.com%7C198f81361dae42167d3b08daa588a73c%7C72f988bf86f141a > f91ab2d7cd011db47%7C1%7C0%7C638004303595147983%7CUnknown%7CTWFpbGZsb3d > 8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C > 3000%7C%7C%7C&sdata=GXKLrLbhUNciFcgBptEbcbsxbpIhlq2iaWDjNdu1GAY%3D > &reserved=0> which I can translate simply into a side-by-side > health service and FSB in the same application. However, the > issue<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F% > 2Fissues.apache.org%2Fjira%2Fbrowse%2FARROW-14440&data=05%7C01%7Ca > kannava%40microsoft.com%7C198f81361dae42167d3b08daa588a73c%7C72f988bf8 > 6f141af91ab2d7cd011db47%7C1%7C0%7C638004303595147983%7CUnknown%7CTWFpb > GZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0 > %3D%7C3000%7C%7C%7C&sdata=%2Fz62q8KzcdQoNOf0zDNXM3gM%2BpLoTtffWwa4 > wlo1UYU%3D&reserved=0> notes this same technique is unavailable to > python (implementations of > FlightServerBase) and the separate servers implies a lack of > comingling logic related to the flight server itself and the health > check, effectively making the health probes very dumb logic/check box > implementation. > > In the issue I opened, I proposed built-in gRPC health check > protocol<https://nam06.safelinks.protection.outlook.com/?url=https%3A% > 2F%2Fgithub.com%2Fgrpc%2Fgrpc%2Fblob%2Fmaster%2Fdoc%2Fhealth-checking. > md%23grpc-health-checking-protocol&data=05%7C01%7Cakannava%40micro > soft.com%7C198f81361dae42167d3b08daa588a73c%7C72f988bf86f141af91ab2d7c > d011db47%7C1%7C0%7C638004303595147983%7CUnknown%7CTWFpbGZsb3d8eyJWIjoi > MC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7 > C%7C&sdata=vkF3vUK6nijaT7xiV%2B5%2F%2BsY5mc5ZbbbmPEJ%2BzBElXjU%3D& > amp;reserved=0> in the native FSB that would thus apply to a "simple" > python derivation without modifications as well as perhaps in the > future allowing the python space to override the implementation > (similar to do_*) for more advanced use cases that might require > extensive python space initialization, etc. > > I'd love to hear others' thoughts on the high-level > direction/alternatives on this topic! > > Regards, > AK