I believe using Thrift gateway over HTTP would likely yield similar
performance to using Protobuf serialization over Rest. The primary
advantage in both cases would be the efficiency gained through binary
serialization. While HTTP/2, possibly with gRPC, could be a superior
option, adopting it would require a completely new implementation.

The primary discussion was based on developers' feedback where they seek to
provide a wrapper over our standard Table/Admin Java interfaces with the
existing REST client like we provide for thrift . However, it appears there
is no unanimous support for wrapping the current RestClient with
Table/Admin functionality. Therefore, let's continue as it is now and
reconsider if there is increased interest from the group in the future.

Regards,
Ankit Singhal

On Thu, 13 Jun 2024 at 10:16, Andrew Purtell <andrew.purt...@gmail.com>
wrote:

> It’s not simply overhead though. The current gateway is a relic from when
> http v1 was the only game in town and simply cannot support many different
> kinds of short request use cases.
>
> Why not run the thrift gateway over http? It would immediately offer far
> superior performance and does not require much, if anything, in the way of
> new design or code.
>
>  If we want to properly contemplate http protocol support with good
> performance, and avoid thrift serialization for some reason, then consider
> redesign on http v2 as foundation and a high performance serialization and
> transport. Something from the Apache Arrow ecosystem may be a good starting
> point.
>
> > On Jun 13, 2024, at 9:10 AM, Ankit Singhal <ankitsingha...@gmail.com>
> wrote:
> >
> > To clarify, Bryan, the deployment you're referring to follows the usual
> > pattern used by most users in both cloud and on-premises environments
> when
> > the client is local to the cluster or has a direct line of sight to all
> the
> > region servers. However, we're increasingly encountering another use
> case.
> > In a hybrid setup , an on-prem client needs to access cloud
> infrastructure
> > but, due to network limitations (especially in the financial
> institutions),
> > can only see a single gateway (serviced via a scalable load balancer)
> > backed by Rest Proxy instead of direct access to all the region servers.
> > This scenario also applies to Kubernetes , as Duo mentioned, where a
> client
> > outside the K8s cluster requires a single ingress endpoint to communicate
> > with the cluster in K8s.
> >
> > And regarding the service discovery issue, we (primarily Balazs Meszaros)
> > have internally developed a Master and Regionserver Proxy (which we will
> > try to put it in Apache) This proxy broadcasts a single address to the
> > client, handles all incoming requests, executes them on the region
> servers
> > on behalf of the client, and then sends the response back. While we
> believe
> > this will be widely adopted, it also has a similar overhead of
> > marshaling/encrypting requests and decrypting/unmarshaling responses as
> the
> > REST proxy.
> >
> > I agree that HTTP is not the most performant option, and our native
> > protocol would indeed perform better with the Java client. However, we've
> > observed that many users are willing to accept a small overhead for the
> > simplicity it brings to networking, especially when they cannot expose
> all
> > the servers hosting regionservers. Until we have a solution for service
> > discovery through a single gateway, this is often the only viable option
> > for our users in such scenarios.
> >
> > Thanks,
> > Ankit Singhal
> >
> >> On Thu, 13 Jun 2024 at 06:33, 张铎(Duo Zhang) <palomino...@gmail.com>
> wrote:
> >>
> >> Deploying several hbase-rest servers in front of the HBase cluster is
> >> a way to solve the service discovery problem, as you can just use a L7
> >> ingress to route the request to these proxies and also easy to do load
> >> balancing.
> >>
> >> But for me, I would still suggest that we try to find a better way to
> >> solve the service discovery problem in the native java client...
> >>
> >> Thanks.
> >>
> >> 张铎(Duo Zhang) <palomino...@gmail.com> 于2024年6月13日周四 21:30写道:
> >>>
> >>> Agree, it is more like a service discovery problem.
> >>>
> >>> For performance, the official hbase java client is a rich client,
> >>> which has a built-in service discovery method, which may not work well
> >>> under some cloud environments.
> >>>
> >>> Bryan Beaudreault <bbeaudrea...@apache.org> 于2024年6月13日周四 21:19写道:
> >>>>
> >>>> We deploy kube-proxy on all of our non-kube ec2 nodes. So any client
> >> can
> >>>> still use the Services we've defined to talk to the pods. Our
> >> HMaster(s) is
> >>>> a Deployment with a Service; our NameNode(s) is a StatefulSet, and we
> >> have
> >>>> a dedicated service per namenode identifier; etc.
> >>>>
> >>>> In our hbase-site.xml, we set the bootstrap nodes to the HMaster
> >> Service.
> >>>> In our hdfs-site.xml we configure each namenode to point at the
> >> specific
> >>>> namenode service.
> >>>>
> >>>> This all works well for us, even with a mixed topology. But yea, that
> >> seems
> >>>> less about what protocol (http vs protobuf) and more about discovery.
> >>>>
> >>>> On Thu, Jun 13, 2024 at 9:08 AM 张铎(Duo Zhang) <palomino...@gmail.com>
> >> wrote:
> >>>>
> >>>>> I guess the problem is that, in some k8s setup, inside the
> >> deployment,
> >>>>> you can use the pod ips to connect to each other, that's the case for
> >>>>> communicating insde HBase cluster, like regionServerReport, etc.
> >>>>>
> >>>>> But outside the deployment, you can only access these services/pods
> >>>>> through ingress, no matter L7 or L4, you need to use a different
> >>>>> identifier.
> >>>>>
> >>>>> In our current design, there is no something like 'advertised
> >> address'
> >>>>> for all the masters and region servers in the cluster, so there is no
> >>>>> way for clients to use different identifiers when connecting the
> >> HBase
> >>>>> cluster.
> >>>>>
> >>>>> But I think for java based client, we'd better try another way to
> >>>>> support the native protocol, for better performance.
> >>>>>
> >>>>> Thanks.
> >>>>>
> >>>>> Bryan Beaudreault <bbeaudrea...@apache.org> 于2024年6月13日周四 19:38写道:
> >>>>>>
> >>>>>> Can you speak more to why HTTP would help users?
> >>>>>>
> >>>>>> We’ve been running hbase in 100% cloud environment for more than a
> >>>>> decade.
> >>>>>> We’ve never really desired an http version of the protocol.
> >>>>>>
> >>>>>> We also run more recently in a mixed kubernetes environment
> >> (master nodes
> >>>>>> in kubernetes, regionserver on dedicated ec2 nodes). In this model
> >> we’ve
> >>>>>> similarly had no real desire for http when working with services,
> >> etc.
> >>>>>>
> >>>>>> One case where it’s been useful to have http endpoint is readiness
> >>>>> checks.
> >>>>>> We’ve built a /health endpoint into the hmaster and regionserver
> >> to that
> >>>>>> end, which I plan to upstream at some point.
> >>>>>>
> >>>>>> To me creating a totally compatible http based protocol seems like
> >> a huge
> >>>>>> lift. So I’m curious what it’d really help for these cloud users.
> >>>>>>
> >>>>>> On Thu, Jun 13, 2024 at 1:07 AM Ankit Singhal <
> >> ankitsingha...@gmail.com>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Hello,
> >>>>>>>
> >>>>>>> *Context for discussion*:
> >>>>>>>
> >>>>>>> These classes were recently relocated from "src/main" to
> >> "src/test"
> >>>>> under
> >>>>>>> HBASE-24115, aligning with the original contributor's initial
> >>>>> intention of
> >>>>>>> using them only for testing. Therefore, I'm raising this in an
> >> email to
> >>>>>>> initiate discussion with the updated information.
> >>>>>>>
> >>>>>>> At Cloudera, we've noticed a growing adoption of HBase in cloud
> >>>>>>> environments. This shift has highlighted concerns regarding
> >> network
> >>>>>>> connectivity requirements between clients and HBase
> >> regionservers,
> >>>>>>> particularly for remote clients. To address these concerns, we
> >> are
> >>>>> looking
> >>>>>>> to utilize standard web protocols such as HTTP. These protocols
> >> enable
> >>>>>>> easier integration with various cloud services by providing a
> >> single
> >>>>>>> endpoint for access and simplifying networking needs. As a
> >> result, more
> >>>>>>> users are interested in using REST servers to meet their
> >> requirements.
> >>>>>>> Istvan has put considerable effort into testing and improving the
> >>>>>>> performance of REST servers, as seen in JIRAs like HBASE-28646,
> >>>>>>> HBASE-28613, HBASE-28626, and HBASE-28556 to achieve this target.
> >>>>>>>
> >>>>>>> *Issue*:
> >>>>>>>
> >>>>>>> However, Users with applications currently using the Java client
> >> are
> >>>>>>> encountering challenges in transitioning to REST due to the
> >> significant
> >>>>>>> code rewriting required. By implementing the Admin and Table
> >>>>> interfaces, we
> >>>>>>> can enable these users to migrate to REST with minimal
> >> adjustments.
> >>>>>>>
> >>>>>>> *Other Protocol Implementations with Java Public APIs: *
> >>>>>>> Similar interfaces have recently been developed for Thrift under
> >>>>>>> HBASE-21661.
> >>>>>>>
> >>>>>>> *Proposed Changes:* Currently, Istvan focuses on addressing the
> >>>>> performance
> >>>>>>> and security aspects of these implementations, with efforts like
> >>>>>>> HBASE-28540 and rewriting Rest Client to support different
> >>>>> authentication
> >>>>>>> options, etc. via HBASE-28501,HBASE-28649, and HBASE-28500, which
> >>>>>>> significantly strengthened the implementation. Hence, we want to
> >> go
> >>>>> ahead
> >>>>>>> and move this implementation back to "src/main".
> >>>>>>>
> >>>>>>> Please inform us of any concerns you may have, otherwise, we
> >> would
> >>>>> like to
> >>>>>>> proceed with the PR.
> >>>>>>>
> >>>>>>>
> >>>>>>> Regards,
> >>>>>>>
> >>>>>>> Ankit Singhal
> >>>>>>>
> >>>>>
> >>
>

Reply via email to