This would be a great improvement (and long overdue).   Thanks for working
on it.
I would be inclined to option #2 and perhaps add an option to drillbit
startup that allows partitioning all existing profiles in a forced manner
(default can be the 1000 profiles that you proposed).
The option makes the user aware that this could take longer.
Having a separate thread is not quite needed since  once the initial
partitioning is done, the new profiles are anyways written to the
sub-directories.

Aman

On Tue, Apr 16, 2019 at 4:57 PM Kunal Khatua <[email protected]> wrote:

> Hi guys
>
> I'm working on a draft PR to improve the management of Drill's query
> profiles.
> https://github.com/apache/drill/pull/1750 [
> https://github.com/apache/drill/pull/1750]
>
> The design basically partitions existing profiles into sub-directories
> based on the structure 'yyyy/MM/dd' (can be customized).
> All new profiles are directly written into partitioned directories.
> For existing profiles in the `profiles` directory, the Drillbit will
> partition the k-most-recent profiles (configurable) into the
> sub-directories; but only once (during startup) to ensure we don't have a
> Drillbit spending too long a time during startup.
> This improves response time for profile listing in the
> WebUI substantially. Especially when the number of profiles are in the
> range of 100s of thousands of profiles.
>
> However, I have the challenge of figuring out what to do for users who
> might be wanting to dump a profile in the same directory for the purpose of
> rendering it in the WebUI.
>
> I have two options at the moment (and open to others):
>
> 1. Create a thread that periodically checks if there is a profile in the
> root of the `profiles` directory that needs to be 'indexed' into its
> correct partition.
> 2. Avoid having the need for creating a thread, by creating a
> unpartitioned sub directory within the `profiles` directory that is only
> meant for hosting profiles for WebUI rendering.
> For e.g., a developer should dump it into a `profiles/tmp` and view it.
>
> I'm inclined towards option #1 because it allows for guarantee that
> eventually all profiles will be 'indexed' into their partitions and that we
> don't need to do it only during start up.
>
> With option #2, e.g., if I have 100,000 profiles and my Drillbits is
> configured to partition only 1000 most recent profiles at startup, i'll
> eventually get all profiles partitioned after 100 restarts!
> However, #2 would ensure that profiles that are only for the purpose of
> rendering can be accessible (for sharing again) and not get indexed. Plus,
> there is no need for an additional thread to be added to the Drillbit.
>
> Which one should I go for? Or is there a third alternative?
>
> Thanks in advance!
>
>  ~ Kunal
>
>
>
>

Reply via email to