[ 
https://issues.apache.org/jira/browse/HBASE-26519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17451801#comment-17451801
 ] 

Bryan Beaudreault commented on HBASE-26519:
-------------------------------------------

Good call. Done. I'll leave this open for now and relay any decisions or close 
it once discussion has finished.

> StoreFileScanner parallel seek -- productionize or drop?
> --------------------------------------------------------
>
>                 Key: HBASE-26519
>                 URL: https://issues.apache.org/jira/browse/HBASE-26519
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Bryan Beaudreault
>            Priority: Minor
>
> hbase.storescanner.parallel.seek.enable was added a few years ago in 
> https://issues.apache.org/jira/browse/HBASE-7495, but still defaults to 
> disabled. The description of that says "Enables StoreFileScanner 
> parallel-seeking in StoreScanner, a feature which can reduce response latency 
> under special conditions".
> It's not very clear what "special conditions" means. Reading through the 
> entire comment history on that issue seems to indicate it can help when you 
> have "high random read, low cache hit rate, many store files". 
> We have a bunch of clusters with this shape, and in fact we use SSDs for all 
> storage so I figured this might help a lot. I tried setting this to true on 
> one RegionServer of one of our highest QPS clusters hoping I'd see some clear 
> improvement. This very simple test was pretty much a wash, so I need to do 
> more methodical testing.
> In the test one thing became clear though – is the default thread pool size 
> of 10 good enough for my use-case? I have no way of knowing, as there is no 
> logging or metrics that I can find around thread pool saturation. What I 
> ended up doing was spamming refresh of the /dump endpoint of the RS, and 
> noticed that there were sometimes 1-5 tasks queued for the RS_PARALLEL_SEEK 
> executor. This indicates maybe I should scale the thread pool, but use-cases 
> change over time so this seems like not a great way to determine that.
> Task queuing seems not great for a feature which is aimed at reducing 
> latencies. I wonder if we should consider some changes to make this more easy 
> to deploy in production. Here are some ideas I had:
>  * Can we generate a better default value for the thread pool size, maybe 
> based on number of RS handler threads or some other heuristic?
>  * Should we consider eliminating queuing for this feature? Instead, if the 
> threadpool is saturated run the seek in-line in the current thread (i.e. 
> revert to normal). This would be more similar to how hedged reads work in 
> HDFS.
>  * Can we expose a metric or logging to help operators know when to scale up 
> the thread pool? If we implemented the 2nd option above we could expose 
> "seeksInCurrentThread" counter to track this, again similar to how hedged 
> reads report on saturation.
> But with all of this said, I wonder if anyone is running this in production 
> and has any updated guidance on when to use this? Does it still make sense 
> given the last 8 years of development in HBase? Would it ever make sense to 
> make it enabled by default?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to