bbeaudreault commented on pull request #3565: URL: https://github.com/apache/hbase/pull/3565#issuecomment-903690738
Thank you for your comment Anoop. The problem we face is that we have hundreds of engineers using hbase in innumerable ways, on multi-tenant clusters. People shift teams, leave, get hired, etc. When you set out to read a row from hbase, you don't always know how large that row will be. Yes I agree with you, if someone knew they were going to fetch a large row, they should do a Scan. But given the above, people often don't. This new feature acts as a guardrail against causing RegionServer pain in those cases. The way we've used this feature is we have our own TableFactory that everyone must use to get tables. The returned Table objects are wrapped. When a Get is submitted, the wrapper uses this to limit the max result size. When data is returned, we inspect the Result and throw an exception if it's a partial result. At that point the user can rewrite their query to use a Scan or add a filter, etc. We also have an escape hatch for urgent issues to allow them to go through, but that is audited. That said, I just noticed `hbase.table.max.rowsize`. Surprisingly that was added shortly after our original patch of this and we haven't noticed it since. The default value looks too large, but given we've already been enforcing this I think we might be able to move to that instead. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
