[ 
https://issues.apache.org/jira/browse/DRILL-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16469293#comment-16469293
 ] 

ASF GitHub Bot commented on DRILL-5270:
---------------------------------------

kkhatua commented on issue #1250: DRILL-5270: Improve loading of profiles 
listing in the WebUI
URL: https://github.com/apache/drill/pull/1250#issuecomment-387839266
 
 
   The way the cache is constructed is by first listing all the profile files 
and sorting them (the profile ID is generated in a monotonically decreasing 
value to ensure sortedness in stores like HBase), This customized TreeSet is 
used to inject profiles (since the FileSystem is not guaranteed to return the 
list in order), so the TreeSet provides the ordering. We retain only the first 
N (which are, implicitly, the latest profiles). If we were to add more profiles 
 than the max capacity, the TreeSet is pruned at the rightmost end.
   With Guava, the eviction policy provides the option of limiting the size, 
but the basis on which it would evict a profile would not work with the 
least-recently used/accessed profile.
   Also, this is currently not a true cache, because the moment we detect 
changes in the underlying store, we reconstruct this 'cache'. Ideally, we'd 
want to identify the newest profiles returned from the FileSystem (using 
filename filters), but the Hadoop API performance is the same (irrespective of 
the filter).
   We, primarily, save the time in fetching file list from the FS and in 
deserializing.
   I can move the implementation of the TreeSet to a separate class to clean up 
the code. That would make debugging simpler too. With Guava, I don't see the 
value add beyond a lower risk of bugs, which should be minimal with the TreeSet 
too. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve loading of profiles listing in the WebUI
> ------------------------------------------------
>
>                 Key: DRILL-5270
>                 URL: https://issues.apache.org/jira/browse/DRILL-5270
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Web Server
>    Affects Versions: 1.9.0
>            Reporter: Kunal Khatua
>            Assignee: Kunal Khatua
>            Priority: Major
>             Fix For: 1.14.0
>
>
> Currently, as the number of profiles increase, we reload the same list of 
> profiles from the FS.
> An ideal improvement would be to detect if there are any new profiles and 
> only reload from the disk then. Otherwise, a cached list is sufficient.
> For a directory of 280K profiles, the load time is close to 6 seconds on a 32 
> core server. With the caching, we can get it down to as much as a few 
> milliseconds.
> To render the cache as invalid, we inspect the last modified time of the 
> directory to confirm whether a reload is needed. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to