[
https://issues.apache.org/jira/browse/HDFS-5810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13894103#comment-13894103
]
Andrew Wang commented on HDFS-5810:
-----------------------------------
Thanks for revving quickly, I looked at the latest patch and agree with the
fixes you mentioned. The stylistic comments were just suggestions, so it's fine
to skip them.
I had a few follow-on replies and new comments though too (still getting
through the patch):
Replies:
* Was this addressed?
bq. If the mmap and normal caches are configured to a small and a big value
respectively...Can we unlink the two caches completely?
* For the tracing, my goal is to avoid runtime overhead if tracing is off. I
think us savvy HDFS developers could figure out how to read a compressed stack
trace, e.g. just showing the names of methods. Another alternative is putting
the trace tag in a thread-local, maybe with a unique ID.
* On the unref/purge, I missed that the refcount starts off at 2, which is
unusual. I'd like to see the SCCache ref done explicitly in SCCache, or at
least documented in SCCache too.
* On timeouts, we might want to be even *more* aggressive at increasing them. A
recent FB HBase/HDFS paper published at FAST 14 shows that caching on the order
of hours to provide continued benefit. The inflection point is something like 1
or 2 hours, so that'd be my target. IIUC, {{allocMmap}} has pressure-based
eviction, so we can tolerate it.
New comments:
* Need timeouts for new tests (I just checked via grep in all changed/new test
files)
More SCCache related:
* The condvar usage in SCCache is mildly complex. It'd be nice to document
where and why we release the lock and wait on a CV, e.g. in fetchOrCreate
* rename "SCReplica#checkStale" to "isStale" per java/hadoop convention
* What's the rationale for the extremely polymorphic Object pointer in
SCReplica? All the casting is messy
* Determining the correct cache sizes right now is complicated for users. I
think users essentially want to specify a global resource limit like "use this
many FDs for caching" and have the HDFS client do the right thing. Right now
though, these limits are a) per ClientContext, so you need to know how many
ClientContexts you'll be using, and b) the two caches have entirely separate FD
allocations, so you can waste your allocation unless you know really well if
you are doing mmapped vs. normal reads. I hope this is possible without config
incompatibility.
* Do we really need to release and retake the lock for making an mmap? I didn't
think this was expensive enough to warrant this, and I noticed you didn't do
this when unmapping.
* allocMmap, we decrement outstandingMmapCount, but isn't that already handled
by the munmap in the line before? If this is legit, a verifying test case would
be good.
I skimmed through the rest of the changes and it looks like SCCache is really
the meat of the change. I'm pretty happy with SCCache after the latest round of
review, and I'll get through BlockReaderFactory and ClientMmap soon. We're
getting close!
> Unify mmap cache and short-circuit file descriptor cache
> --------------------------------------------------------
>
> Key: HDFS-5810
> URL: https://issues.apache.org/jira/browse/HDFS-5810
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: hdfs-client
> Affects Versions: 2.3.0
> Reporter: Colin Patrick McCabe
> Assignee: Colin Patrick McCabe
> Attachments: HDFS-5810.001.patch, HDFS-5810.004.patch,
> HDFS-5810.006.patch, HDFS-5810.008.patch, HDFS-5810.015.patch,
> HDFS-5810.016.patch
>
>
> We should unify the client mmap cache and the client file descriptor cache.
> Since mmaps are granted corresponding to file descriptors in the cache
> (currently FileInputStreamCache), they have to be tracked together to do
> "smarter" things like HDFS-5182.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)