[
https://issues.apache.org/jira/browse/HDFS-5096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Colin Patrick McCabe updated HDFS-5096:
---------------------------------------
Attachment: HDFS-5096-caching.005.patch
This patch changes the way we do caching on the backend a bit. The new
approach is based on periodic scanning of the namespace. The scan interval is
controlled by {{dfs.namenode.path.based.cache.refresh.interval.ms}}. The time
complexity is proportional to O(num_path_based_cache_entries *
num_blocks_per_PBCE).
Eventually, it would be nice to have a more edge-triggered approach to caching.
However, I think that we are always going to want a scanner, to make sure that
we didn't miss anything. So we might as well do the scanner first, since it
covers most of the use cases we want. It handles issues like correctly
handling still-open files with non-finalized blocks, as well as the huge number
of operations that move inodes, without a lot of struggle and/or bugginess.
I added "replication" as a field in PBCDi, PBCDe, PBCE, and plumbed it through
RPC, edit log, and the fsimage. We can now control how many cached replicas we
want, rather than just blindly caching on every DN as before.
The {{CacheManager}} no longer has its own internal lock. Instead, it relies
on the {{FSNamesystem}} lock. This is better, since we are holding the FSN
lock in every case we want to call into the {{CacheManager}}. It makes
coordination between the CM and other components a lot easier and eliminates
some suspect locking scenarios.
I unified {{CacheReplicationManager}} and {{CacheManager}}. Both of these
classes were doing the same thing using the same shared state, so it made sense
to only have one. The scanner code remains in {{CacheReplicationMonitor}} to
avoid {{CacheManager}} getting too big. I wanted to use an {{Executor}} for
the CRMon, but I was unable to find one that supported both scheduling the task
at a certain rate, and "poking" the task to get it to run immediately. So I
just used a {{Thread}}, which is simple enough, I think. Fixed some improper
use of wall-clock time where monotonic time was needed while I was in there.
On the DN side, I switched some of the logic to use block IDs rather than Block
objects. We should not need genstamp and block length here, since that is
taken care of elsewhere. For now, those things remain in the cached block
report and DNA_CACHE, etc., but we should take them out soon since they're not
needed and take up space on the wire.
I also fixed an error message that was inverted (it was saying the configured
mlock space was "less than the datanodes's available", but really it should
have been *more*). This could be split out into a separate patch, but it's a
tiny change, so I rolled it up into here.
Cached block state is stored in a slightly different data structure.
Previously, we were re-using a lot of the {{BlockManager}} structures such as
{{BlockInfo}} and {{BlocksMap}}. However, they're poorly suited in some ways,
since they have fields we don't care about, and lack some fields we do. Also,
using them raises the possibility of putting one of our {{BlockInfo}} objects
into one of their {{BlocksMap}} structures, which would create havoc.
The new structure has nodes of type {{CachedBlock}}. Each {{CachedBlock}}
object can be a member of several implicit linked lists. Each
{{DatanodeDescriptor}} has three caching-related lists: pendingCached, cached,
and pendingUncached. As the names imply, they track the blocks which are in
that state with regard to that DN.
> Automatically cache new data added to a cached path
> ---------------------------------------------------
>
> Key: HDFS-5096
> URL: https://issues.apache.org/jira/browse/HDFS-5096
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: datanode, namenode
> Reporter: Andrew Wang
> Assignee: Colin Patrick McCabe
> Attachments: HDFS-5096-caching.005.patch
>
>
> For some applications, it's convenient to specify a path to cache, and have
> HDFS automatically cache new data added to the path without sending a new
> caching request or a manual refresh command.
> One example is new data appended to a cached file. It would be nice to
> re-cache a block at the new appended length, and cache new blocks added to
> the file.
> Another example is a cached Hive partition directory, where a user can drop
> new files directly into the partition. It would be nice if these new files
> were cached.
> In both cases, this automatic caching would happen after the file is closed,
> i.e. block replica is finalized.
--
This message was sent by Atlassian JIRA
(v6.1#6144)