On Tuesday, December 5, 2017 at 1:26:23 PM UTC+1, Mark Price wrote: > That (each process having it's own copy) is surprising to me. Unless the > mapping is such that private copies are required, I'd expect the processes to > share the page cache entries. > > > I can't recreate this effect locally using FileChannel.map(); the library in > use in the application uses a slightly more exotic route to get to mmap, so > it could be a bug there; will investigate. I could also have been imagining > it. > > > > > > Is your pre-toucher thread a Java thread doing it's pre-touching using mapped > i/o in the same process? If so, then the pre-toucher thread itself will be a > high TTSP causer. The trick is to do the pre-touch in a thread that is > already at a safepoint (e.g. do your pre-touch using mapped i/o from within a > JNI call, use another process, or do the retouch with non-mapped i/o). > > > Yes, just a java thread in the same process; I hadn't considered that it > would also cause long TTSP, but of course it's just as likely (or more > likely) to be scheduled off due to a page fault. I could try using pwrite via > FileChannel.write() to do the pre-touching, but I think it needs to perform a > CAS (i.e. don't overwrite data that is already present), so a JNI method > would be the only way to go. Unless just doing a > FileChannel.position(writeLimit).read(buffer) would do the job? Presumably > that is enough to load the page into the cache and performing a write is > unnecessary.
This (non mapped reading at the write limit) will work to eliminate the actual page I/O impact on TTSP, but the time update path with the lock that you show in your initial stack trace will probably still hit you. I’d go either with a JNI CAS, or a forked-off mapped Java pretoucher as a separate process (tell it what you wNt touched via its stdin). Not sure which one is uglier. The pure java is more portable (for Unix/Linux variants at least) > > > > > > > Cheers, > > Mark > > On Tuesday, 5 December 2017 10:53:17 UTC, Gil Tene wrote: > Page faults in mapped file i/o and counted loops are certainly two common > causes of long TTSP. But there are many other paths that *could* cause it as > well in HotSpot. Without catching it and looking at the stack trace, it's > hard to know which ones to blame. Once you knock out one cause, you'll see if > there is another. > > > In the specific stack trace you showed [assuming that trace was taken during > a long TTSP], mapped file i/o is the most likely culprit. Your trace seems to > be around making the page write-able for the first time and updating the file > time (which takes a lock), but even without needing the lock, the fault > itself could end up waiting for the i/o to complete (read page from disk), > and that (when Murphy pays you a visit) can end up waiting behind 100s other > i/o operations (e.g. when your i/o happens at the same time the kernel > decided to flush some dirty pages in the cache), leading to TTSPs in the 100s > of msec. > > > As I'm sure you already know, one simple way to get around mapped file > related TTSP is to not used mapped files. Explicit random i/o calls are > always done while at a safepoint, so they can't cause high TTSPs. > > On Tuesday, December 5, 2017 at 10:30:57 AM UTC+1, Mark Price wrote: > Hi Aleksey, > thanks for the response. The I/O is definitely one problem, but I was trying > to figure out whether it was contributing to the long TTSP times, or whether > I might have some code that was misbehaving (e.g. NonCountedLoops). > > Your response aligns with my guesswork, so hopefully I just have the one > problem to solve ;) > > > > Cheers, > > Mark > > On Tuesday, 5 December 2017 09:24:33 UTC, Aleksey Shipilev wrote:On > 12/05/2017 09:26 AM, Mark Price wrote: > > > I'm investigating some long time-to-safepoint pauses in oracle/openjdk. The > > application in question > > > is also suffering from some fairly nasty I/O problems where > > latency-sensitive threads are being > > > descheduled in uninterruptible sleep state due to needing a file-system > > lock. > > > > > > My question: can the JVM detect that a thread is in > > signal/interrupt-handler code and thus treat it > > > as though it is at a safepoint (as I believe happens when a thread is in > > native code via a JNI call)? > > > > > > For instance, given the stack trace below, will the JVM need to wait for > > the thread to be scheduled > > > back on to CPU in order to come to a safepoint, or will it be treated as > > "in-native"? > > > > > > 7fff81714cd9 __schedule ([kernel.kallsyms]) > > > 7fff817151e5 schedule ([kernel.kallsyms]) > > > 7fff81717a4b rwsem_down_write_failed ([kernel.kallsyms]) > > > 7fff813556e7 call_rwsem_down_write_failed ([kernel.kallsyms]) > > > 7fff817172ad down_write ([kernel.kallsyms]) > > > 7fffa0403dcf xfs_ilock ([kernel.kallsyms]) > > > 7fffa04018fe xfs_vn_update_time ([kernel.kallsyms]) > > > 7fff8122cc5d file_update_time ([kernel.kallsyms]) > > > 7fffa03f7183 xfs_filemap_page_mkwrite ([kernel.kallsyms]) > > > 7fff811ba935 do_page_mkwrite ([kernel.kallsyms]) > > > 7fff811bda74 handle_pte_fault ([kernel.kallsyms]) > > > 7fff811c041b handle_mm_fault ([kernel.kallsyms]) > > > 7fff8106adbe __do_page_fault ([kernel.kallsyms]) > > > 7fff8106b0c0 do_page_fault ([kernel.kallsyms]) > > > 7fff8171af48 page_fault ([kernel.kallsyms]) > > > ---- java stack trace ends here ---- > > > > I am pretty sure out-of-band page fault in Java thread does not yield a > safepoint. At least because > > safepoint polls happen at given location in the generated code, because we > need the pointer map as > > the part of the machine state, and that is generated by Hotspot (only) around > the safepoint polls. > > Page faulting on random read/write insns does not have that luxury. Even if > JVM had intercepted that > > fault, there is not enough metadata to work on. > > > > The stacktrace above seems to say you have page faulted and this incurred > disk I/O? This is > > swapping, I think, and all performance bets are off at that point. > > > > Thanks, > > -Aleksey -- You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
