[
https://issues.apache.org/jira/browse/HDFS-14304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Work on HDFS-14304 started by Sahil Takiar.
-------------------------------------------
> High lock contention on hdfsHashMutex in libhdfs
> ------------------------------------------------
>
> Key: HDFS-14304
> URL: https://issues.apache.org/jira/browse/HDFS-14304
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: hdfs-client, libhdfs, native
> Reporter: Sahil Takiar
> Assignee: Sahil Takiar
> Priority: Major
>
> While doing some performance profiling of an application using libhdfs, we
> noticed a high amount of lock contention on the {{hdfsHashMutex}} defined in
> {{hadoop-hdfs-native-client/src/main/native/libhdfs/os/mutexes.h}}
> The issue is that every JNI method invocation done by {{hdfs.c}} goes through
> a helper method called {{invokeMethod}}. {{invokeMethod}} calls
> {{globalClassReference}} which acquires {{hdfsHashMutex}} while performing a
> lookup in a {{htable}} (a custom hash table that lives in {{libhdfs/common}})
> (the lock is acquired for both reads and writes). The hash table maps {{char
> *className}} to {{jclass}} objects, it seems the goal of the hash table is to
> avoid repeatedly creating {{jclass}} objects for each JNI call.
> For multi-threaded applications, this lock severely limits that rate at which
> Java methods can be invoked. pstacks show a lot of time being spent on
> {{hdfsHashMutex}}
> {code:java}
> #0 0x00007fba2dbc242d in __lll_lock_wait () from /lib64/libpthread.so.0
> #1 0x00007fba2dbbddcb in _L_lock_812 () from /lib64/libpthread.so.0
> #2 0x00007fba2dbbdc98 in pthread_mutex_lock () from /lib64/libpthread.so.0
> #3 0x00000000027d8386 in mutexLock ()
> #4 0x00000000027d0e7b in globalClassReference ()
> #5 0x00000000027d1160 in invokeMethod ()
> #6 0x00000000027d4176 in readDirect ()
> #7 0x00000000027d4325 in hdfsRead ()
> {code}
> Same with {{perf report}}
> {code:java}
> + 63.36% 0.01% [k] system_call_fastpath
> + 61.60% 0.12% [k] sys_futex
> + 61.45% 0.13% [k] do_futex
> + 57.54% 0.49% [k] _raw_qspin_lock
> + 57.07% 0.01% [k] queued_spin_lock_slowpath
> + 55.47% 55.47% [k] native_queued_spin_lock_slowpath
> - 35.68% 0.00% [k] 0x6f6f6461682f6568
> - 0x6f6f6461682f6568
> - 30.55% __lll_lock_wait
> - 29.40% system_call_fastpath
> - 29.39% sys_futex
> - 29.35% do_futex
> - 29.27% futex_wait
> - 28.17% futex_wait_setup
> - 27.05% _raw_qspin_lock
> - 27.05% queued_spin_lock_slowpath
> 26.30% native_queued_spin_lock_slowpath
> + 0.67% ret_from_intr
> + 0.71% futex_wait_queue_me
> - 2.00% methodIdFromClass
> - 1.94% jni_GetMethodID
> - 1.71% get_method_id
> 0.96% SymbolTable::lookup_only
> - 1.61% invokeMethod
> - 0.62% jni_CallLongMethodV
> 0.52% jni_invoke_nonstatic
> 0.75% pthread_mutex_lock
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]