Re: [PR] HBASE-28647 Support streams in REST Client, RemoteHTable and RemoteAdmin [hbase]

via GitHub Mon, 08 Jul 2024 07:36:58 -0700


stoty commented on PR #6010:
URL: https://github.com/apache/hbase/pull/6010#issuecomment-2214255662


   > I do not fully understand what does the streams mean here...
   > 
   > All requests and responses are fully kept in memory here I think?
   
   Currently they are.
   The current code calls the Apache HttpClient getResponseBody() method, which 
will cause the client to wait untill all data is received, and load it into a 
byte array.
   
   However, the goal is to avoid having to do that.
   
   Protobuf primarily works on streams, so for a large resultset, we may reduce 
both processing (wall clock) time and memory consumption by not buffering the 
whole response into memory, but reading directly from the stream, so that
   * We do not have to wait for the full response to arrive before starting to 
process it.
   * We do not have to copy the whole response into a single byte array.
   * The processed response segments can be GCd while we are processing the 
rest of the message.
   
   The Cell/Cellset structures are still kept in memory, but we avoid having to 
explicitly store them twice during processing (once the serialized byte array 
and once the java POJOs)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] HBASE-28647 Support streams in REST Client, RemoteHTable and RemoteAdmin [hbase]

Reply via email to