Re: JVM detection of thread at safepoint

Mark Price Tue, 05 Dec 2017 04:26:57 -0800

>
>
> That (each process having it's own copy) is surprising to me. Unless the 
> mapping is such that private copies are required, I'd expect the processes 
> to share the page cache entries.
>


I can't recreate this effect locally using FileChannel.map(); the library 
in use in the application uses a slightly more exotic route to get to mmap, 
so it could be a bug there; will investigate. I could also have been 
imagining it.
 

>  
>
> Is your pre-toucher thread a Java thread doing it's pre-touching using 
> mapped i/o in the same process? If so, then the pre-toucher thread itself 
> will be a high TTSP causer. The trick is to do the pre-touch in a thread 
> that is already at a safepoint (e.g. do your pre-touch using mapped i/o 
> from within a JNI call, use another process, or do the retouch with 
> non-mapped i/o).
>

Yes, just a java thread in the same process; I hadn't considered that it 
would also cause long TTSP, but of course it's just as likely (or more 
likely) to be scheduled off due to a page fault. I could try using pwrite 
via FileChannel.write() to do the pre-touching, but I think it needs to 
perform a CAS (i.e. don't overwrite data that is already present), so a JNI 
method would be the only way to go. Unless just doing a 
FileChannel.position(writeLimit).read(buffer) would do the job? Presumably 
that is enough to load the page into the cache and performing a write is 
unnecessary.
 

>  
>
>>
>>
>> Cheers,
>>
>> Mark
>>
>> On Tuesday, 5 December 2017 10:53:17 UTC, Gil Tene wrote:
>>>
>>> Page faults in mapped file i/o and counted loops are certainly two 
>>> common causes of long TTSP. But there are many other paths that *could* 
>>> cause it as well in HotSpot. Without catching it and looking at the stack 
>>> trace, it's hard to know which ones to blame. Once you knock out one cause, 
>>> you'll see if there is another.
>>>
>>> In the specific stack trace you showed [assuming that trace was taken 
>>> during a long TTSP], mapped file i/o is the most likely culprit. Your trace 
>>> seems to be around making the page write-able for the first time and 
>>> updating the file time (which takes a lock), but even without needing the 
>>> lock, the fault itself could end up waiting for the i/o to complete (read 
>>> page from disk), and that (when Murphy pays you a visit) can end up waiting 
>>> behind 100s other i/o operations (e.g. when your i/o happens at the same 
>>> time the kernel decided to flush some dirty pages in the cache), leading to 
>>> TTSPs in the 100s of msec.
>>>
>>> As I'm sure you already know, one simple way to get around mapped file 
>>> related TTSP is to not used mapped files. Explicit random i/o calls are 
>>> always done while at a safepoint, so they can't cause high TTSPs.
>>>
>>> On Tuesday, December 5, 2017 at 10:30:57 AM UTC+1, Mark Price wrote:
>>>>
>>>> Hi Aleksey,
>>>> thanks for the response. The I/O is definitely one problem, but I was 
>>>> trying to figure out whether it was contributing to the long TTSP times, 
>>>> or 
>>>> whether I might have some code that was misbehaving (e.g. NonCountedLoops).
>>>>
>>>> Your response aligns with my guesswork, so hopefully I just have the 
>>>> one problem to solve ;)
>>>>
>>>>
>>>>
>>>> Cheers,
>>>>
>>>> Mark
>>>>
>>>> On Tuesday, 5 December 2017 09:24:33 UTC, Aleksey Shipilev wrote:
>>>>>
>>>>> On 12/05/2017 09:26 AM, Mark Price wrote: 
>>>>> > I'm investigating some long time-to-safepoint pauses in 
>>>>> oracle/openjdk. The application in question 
>>>>> > is also suffering from some fairly nasty I/O problems where 
>>>>> latency-sensitive threads are being 
>>>>> > descheduled in uninterruptible sleep state due to needing a 
>>>>> file-system lock. 
>>>>> > 
>>>>> > My question: can the JVM detect that a thread is in 
>>>>> signal/interrupt-handler code and thus treat it 
>>>>> > as though it is at a safepoint (as I believe happens when a thread 
>>>>> is in native code via a JNI call)? 
>>>>> > 
>>>>> > For instance, given the stack trace below, will the JVM need to wait 
>>>>> for the thread to be scheduled 
>>>>> > back on to CPU in order to come to a safepoint, or will it be 
>>>>> treated as "in-native"? 
>>>>> > 
>>>>> >         7fff81714cd9 __schedule ([kernel.kallsyms]) 
>>>>> >         7fff817151e5 schedule ([kernel.kallsyms]) 
>>>>> >         7fff81717a4b rwsem_down_write_failed ([kernel.kallsyms]) 
>>>>> >         7fff813556e7 call_rwsem_down_write_failed 
>>>>> ([kernel.kallsyms]) 
>>>>> >         7fff817172ad down_write ([kernel.kallsyms]) 
>>>>> >         7fffa0403dcf xfs_ilock ([kernel.kallsyms]) 
>>>>> >         7fffa04018fe xfs_vn_update_time ([kernel.kallsyms]) 
>>>>> >         7fff8122cc5d file_update_time ([kernel.kallsyms]) 
>>>>> >         7fffa03f7183 xfs_filemap_page_mkwrite ([kernel.kallsyms]) 
>>>>> >         7fff811ba935 do_page_mkwrite ([kernel.kallsyms]) 
>>>>> >         7fff811bda74 handle_pte_fault ([kernel.kallsyms]) 
>>>>> >         7fff811c041b handle_mm_fault ([kernel.kallsyms]) 
>>>>> >         7fff8106adbe __do_page_fault ([kernel.kallsyms]) 
>>>>> >         7fff8106b0c0 do_page_fault ([kernel.kallsyms]) 
>>>>> >         7fff8171af48 page_fault ([kernel.kallsyms]) 
>>>>> >         ---- java stack trace ends here ---- 
>>>>>
>>>>> I am pretty sure out-of-band page fault in Java thread does not yield 
>>>>> a safepoint. At least because 
>>>>> safepoint polls happen at given location in the generated code, 
>>>>> because we need the pointer map as 
>>>>> the part of the machine state, and that is generated by Hotspot (only) 
>>>>> around the safepoint polls. 
>>>>> Page faulting on random read/write insns does not have that luxury. 
>>>>> Even if JVM had intercepted that 
>>>>> fault, there is not enough metadata to work on. 
>>>>>
>>>>> The stacktrace above seems to say you have page faulted and this 
>>>>> incurred disk I/O? This is 
>>>>> swapping, I think, and all performance bets are off at that point. 
>>>>>
>>>>> Thanks, 
>>>>> -Aleksey 
>>>>>
>>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: JVM detection of thread at safepoint

Reply via email to