Re: [DISCUSS] StoreFileScanner parallel seek -- productionize or drop?

Andrew Purtell Wed, 01 Dec 2021 07:07:34 -0800

Unless the potential payoff is significant (yes, this might be hard to
guess) I would vote for dropping a complex and incomplete (IMHO)
disabled-by-default 'feature' that is, I would estimate, rarely used if at
all, probably not at all.



On Wed, Dec 1, 2021 at 8:05 AM Bryan Beaudreault
<[email protected]> wrote:

> hbase.storescanner.parallel.seek.enable was added a few years ago in
> https://issues.apache.org/jira/browse/HBASE-7495, but still defaults to
> disabled. The description says "Enables StoreFileScanner parallel-seeking
> in StoreScanner, a feature which can reduce response latency under special
> conditions".
>
> It's not very clear what "special conditions" means. Reading through the
> entire comment history on that issue seems to indicate it can help when you
> have "high random read, low cache hit rate, many store files".
>
> We have a bunch of clusters with this shape, and in fact we use SSDs for
> all storage so I figured this might help a lot. I tried setting this to
> true on one RegionServer of one of our highest QPS clusters hoping I'd see
> some clear improvement. This very simple test was pretty much a wash, so I
> need to do more methodical testing.
>
> In the test one thing became clear though – is the default thread pool size
> of 10 good enough for my use-case? I have no way of knowing, as there is no
> logging or metrics that I can find around thread pool saturation. What I
> ended up doing was spamming refresh of the /dump endpoint of the RS, and
> noticed that there were sometimes 1-5 tasks queued for the RS_PARALLEL_SEEK
> executor. This indicates maybe I should scale the thread pool, but
> use-cases change over time so this seems like not a great way to determine
> that.
>
> Task queuing seems not great for a feature which is aimed at reducing
> latencies. I wonder if we should consider some changes to make this more
> easy to deploy in production. Here are some ideas I had:
>
>    - Can we generate a better default value for the thread pool size, maybe
>    based on number of RS handler threads or some other heuristic?
>    - Should we consider eliminating queuing for this feature? Instead, if
>    the threadpool is saturated run the seek in-line in the current thread
>    (i.e. revert to normal). This would be more similar to how hedged reads
>    work in HDFS.
>    - Can we expose a metric or logging to help operators know when to scale
>    up the thread pool? If we implemented the 2nd option above we could
> expose
>    "seeksInCurrentThread" counter to track this, again similar to how
> hedged
>    reads report on saturation.
>
> But with all of this said, I wonder if anyone is running this in production
> and has any updated guidance on when to use this? Does it still make sense
> given the last 8 years of development in HBase? Would it ever make sense to
> make it enabled by default?
>


-- 
Best regards,
Andrew

Words like orphans lost among the crosstalk, meaning torn from truth's
decrepit hands
   - A23, Crosstalk

Re: [DISCUSS] StoreFileScanner parallel seek -- productionize or drop?

Reply via email to