Hi guys

I'm working on a draft PR to improve the management of Drill's query profiles. 
https://github.com/apache/drill/pull/1750 
[https://github.com/apache/drill/pull/1750]

The design basically partitions existing profiles into sub-directories based on 
the structure 'yyyy/MM/dd' (can be customized).
All new profiles are directly written into partitioned directories.
For existing profiles in the `profiles` directory, the Drillbit will partition 
the k-most-recent profiles (configurable) into the sub-directories; but only 
once (during startup) to ensure we don't have a Drillbit spending too long a 
time during startup. 
This improves response time for profile listing in the WebUI substantially. 
Especially when the number of profiles are in the range of 100s of thousands of 
profiles.

However, I have the challenge of figuring out what to do for users who might be 
wanting to dump a profile in the same directory for the purpose of rendering it 
in the WebUI. 

I have two options at the moment (and open to others):

1. Create a thread that periodically checks if there is a profile in the root 
of the `profiles` directory that needs to be 'indexed' into its correct 
partition.
2. Avoid having the need for creating a thread, by creating a unpartitioned sub 
directory within the `profiles` directory that is only meant for hosting 
profiles for WebUI rendering. 
For e.g., a developer should dump it into a `profiles/tmp` and view it.

I'm inclined towards option #1 because it allows for guarantee that eventually 
all profiles will be 'indexed' into their partitions and that we don't need to 
do it only during start up. 

With option #2, e.g., if I have 100,000 profiles and my Drillbits is configured 
to partition only 1000 most recent profiles at startup, i'll eventually get all 
profiles partitioned after 100 restarts!
However, #2 would ensure that profiles that are only for the purpose of 
rendering can be accessible (for sharing again) and not get indexed. Plus, 
there is no need for an additional thread to be added to the Drillbit.

Which one should I go for? Or is there a third alternative?

Thanks in advance!

 ~ Kunal



Reply via email to