Hi guys I'm working on a draft PR to improve the management of Drill's query profiles. https://github.com/apache/drill/pull/1750 [https://github.com/apache/drill/pull/1750]
The design basically partitions existing profiles into sub-directories based on the structure 'yyyy/MM/dd' (can be customized). All new profiles are directly written into partitioned directories. For existing profiles in the `profiles` directory, the Drillbit will partition the k-most-recent profiles (configurable) into the sub-directories; but only once (during startup) to ensure we don't have a Drillbit spending too long a time during startup. This improves response time for profile listing in the WebUI substantially. Especially when the number of profiles are in the range of 100s of thousands of profiles. However, I have the challenge of figuring out what to do for users who might be wanting to dump a profile in the same directory for the purpose of rendering it in the WebUI. I have two options at the moment (and open to others): 1. Create a thread that periodically checks if there is a profile in the root of the `profiles` directory that needs to be 'indexed' into its correct partition. 2. Avoid having the need for creating a thread, by creating a unpartitioned sub directory within the `profiles` directory that is only meant for hosting profiles for WebUI rendering. For e.g., a developer should dump it into a `profiles/tmp` and view it. I'm inclined towards option #1 because it allows for guarantee that eventually all profiles will be 'indexed' into their partitions and that we don't need to do it only during start up. With option #2, e.g., if I have 100,000 profiles and my Drillbits is configured to partition only 1000 most recent profiles at startup, i'll eventually get all profiles partitioned after 100 restarts! However, #2 would ensure that profiles that are only for the purpose of rendering can be accessible (for sharing again) and not get indexed. Plus, there is no need for an additional thread to be added to the Drillbit. Which one should I go for? Or is there a third alternative? Thanks in advance! ~ Kunal
