[ https://issues.apache.org/jira/browse/DRILL-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16469293#comment-16469293 ]
ASF GitHub Bot commented on DRILL-5270: --------------------------------------- kkhatua commented on issue #1250: DRILL-5270: Improve loading of profiles listing in the WebUI URL: https://github.com/apache/drill/pull/1250#issuecomment-387839266 The way the cache is constructed is by first listing all the profile files and sorting them (the profile ID is generated in a monotonically decreasing value to ensure sortedness in stores like HBase), This customized TreeSet is used to inject profiles (since the FileSystem is not guaranteed to return the list in order), so the TreeSet provides the ordering. We retain only the first N (which are, implicitly, the latest profiles). If we were to add more profiles than the max capacity, the TreeSet is pruned at the rightmost end. With Guava, the eviction policy provides the option of limiting the size, but the basis on which it would evict a profile would not work with the least-recently used/accessed profile. Also, this is currently not a true cache, because the moment we detect changes in the underlying store, we reconstruct this 'cache'. Ideally, we'd want to identify the newest profiles returned from the FileSystem (using filename filters), but the Hadoop API performance is the same (irrespective of the filter). We, primarily, save the time in fetching file list from the FS and in deserializing. I can move the implementation of the TreeSet to a separate class to clean up the code. That would make debugging simpler too. With Guava, I don't see the value add beyond a lower risk of bugs, which should be minimal with the TreeSet too. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Improve loading of profiles listing in the WebUI > ------------------------------------------------ > > Key: DRILL-5270 > URL: https://issues.apache.org/jira/browse/DRILL-5270 > Project: Apache Drill > Issue Type: Improvement > Components: Web Server > Affects Versions: 1.9.0 > Reporter: Kunal Khatua > Assignee: Kunal Khatua > Priority: Major > Fix For: 1.14.0 > > > Currently, as the number of profiles increase, we reload the same list of > profiles from the FS. > An ideal improvement would be to detect if there are any new profiles and > only reload from the disk then. Otherwise, a cached list is sufficient. > For a directory of 280K profiles, the load time is close to 6 seconds on a 32 > core server. With the caching, we can get it down to as much as a few > milliseconds. > To render the cache as invalid, we inspect the last modified time of the > directory to confirm whether a reload is needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)