Joe McDonnell created IMPALA-9433:
-------------------------------------
Summary: Change FileHandleCache from using a multimap to an
unordered_map
Key: IMPALA-9433
URL: https://issues.apache.org/jira/browse/IMPALA-9433
Project: IMPALA
Issue Type: Bug
Components: Backend
Affects Versions: Impala 3.4.0
Reporter: Joe McDonnell
The file handle cache can contain multiple file handles per filename. Currently
it uses a std::multimap, where the file handles for each filename are a
contiguous set of entries. A lookup will find the beginning of that range and
then iterate through it to find a free one.
A multimap is implemented as a red-black tree with O(log(N)) lookup, so we
should be able to improve this by using a hashtable-based structure such as
unordered_map/unordered_multimap with O(1) lookup.
Another optimization would be to add an intermediary structure for each
filename and hold all the file handles for that file name in a linked list.
Lookup would find this intermediary structure by looking up the filename, then
it would iterate. In the current method, the key/value pair for each file
handle must store a copy of the filename string as the key, even for
duplicates. With the intermediary structure, it would store the filename once
per unique filename.
It also looks like the LRU list would benefit from being a Boost intrusive list
([https://www.boost.org/doc/libs/1_64_0/doc/html/intrusive.html]). Every file
handle is always in the LRU list, so a std::list has a higher memory overhead
and requires more memory accesses. It also complicates the code, because the
FileHandleEntry needs to store a LruListType::iterator to its location in the
LRU list.
These optimizations are low priority, but they provide good ramp-up for some
C++ concepts/APIs.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]