[ https://issues.apache.org/jira/browse/RANGER-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17194292#comment-17194292 ]
Velmurugan Periasamy commented on RANGER-2993: ---------------------------------------------- [~mollonado] - I don't think this is specific to kubernets. Did you see this issue in other env too? CC [~spolavarapu] > Syncing AD/LDAP groups with special characters causing Usersync to get stuck > ---------------------------------------------------------------------------- > > Key: RANGER-2993 > URL: https://issues.apache.org/jira/browse/RANGER-2993 > Project: Ranger > Issue Type: Bug > Components: usersync > Affects Versions: 2.0.0 > Reporter: Georgi Ivanov > Priority: Major > > We are running Ranger on kubernetes. The usersync component runs in a > separate pod as a standalone component. During the initial sync, it throws an > error about a AD/LDAP group that contains a special character. > > {code:java} > 10 Sep 2020 03:24:01 ERROR LdapDeltaUserGroupBuilder [UnixUserSyncThread] - > sink.addOrUpdateGroup failed with exception: Failed to add addorUpdate group > user info, for group: s-TFxLabRun%, users: [...] > 10 Sep 2020 03:24:01 ERROR LdapPolicyMgrUserGroupBuilder [UnixUserSyncThread] > - Failed to add addorUpdate group user info{code} > And after that the sync does not continue to the next cycle. > After 3-4 hours after this error (no logs from the LdapDeltaUserGroupBuilder > or LdapPolicyMgrUserGroupBuilder during that time) we see this log entry > {code:java} > 10 Sep 2020 06:38:27 INFO UserGroupSync [UnixUserSyncThread] - End: initial > load of user/group from source==>sink > 10 Sep 2020 06:38:27 INFO UserGroupSync [UnixUserSyncThread] - Done > initializing user/group source and sink{code} > > And no more logs after that > A strace on the process shows it's stuck in a sleep > {code:java} > # jps > 226 UnixAuthenticationService > 2242 Jps > # strace -p 226 > strace: Process 226 attached > futex(0x7f637e9149d0, FUTEX_WAIT, 227, NULL {code} > > Jstack also shows the what the UnixUserSyncThread is in waiting state > (sleeping). There are some locked threads but I don't think they are related > to the bug. > > {noformat} > # jstack 226 > 2020-09-11 11:22:37 > Full thread dump OpenJDK 64-Bit Server VM (25.232-b09 mixed mode):"Attach > Listener" #1657 daemon prio=9 os_prio=0 tid=0x00007f6350001000 nid=0x798 > waiting on condition [0x0000000000000000] > java.lang.Thread.State: > RUNNABLE"org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner" > #12 daemon prio=5 os_prio=0 tid=0x00007f633c2ac800 nid=0xf0 in Object.wait() > [0x00007f636626a000] > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > - waiting on <0x00000000d56d46e0> (a java.lang.ref.ReferenceQueue$Lock) > at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:144) > - locked <0x00000000d56d46e0> (a java.lang.ref.ReferenceQueue$Lock) > at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:165) > at > org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner.run(FileSystem.java:3806) > at java.lang.Thread.run(Thread.java:748)"UnixUserSyncThread" #8 prio=5 > os_prio=0 tid=0x00007f6378321800 nid=0xec waiting on condition > [0x00007f63663a4000] > java.lang.Thread.State: TIMED_WAITING (sleeping) > at java.lang.Thread.sleep(Native Method) > at > org.apache.ranger.usergroupsync.UserGroupSync.run(UserGroupSync.java:79) > at java.lang.Thread.run(Thread.java:748)"Service Thread" #7 daemon > prio=9 os_prio=0 tid=0x00007f63780b4800 nid=0xea runnable [0x0000000000000000] > java.lang.Thread.State: RUNNABLE"C1 CompilerThread1" #6 daemon prio=9 > os_prio=0 tid=0x00007f63780b1000 nid=0xe9 waiting on condition > [0x0000000000000000] > java.lang.Thread.State: RUNNABLE"C2 CompilerThread0" #5 daemon prio=9 > os_prio=0 tid=0x00007f63780af000 nid=0xe8 waiting on condition > [0x0000000000000000] > java.lang.Thread.State: RUNNABLE"Signal Dispatcher" #4 daemon prio=9 > os_prio=0 tid=0x00007f63780ad800 nid=0xe7 runnable [0x0000000000000000] > java.lang.Thread.State: RUNNABLE"Finalizer" #3 daemon prio=8 os_prio=0 > tid=0x00007f637807c000 nid=0xe6 in Object.wait() [0x00007f6366e90000] > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:144) > - locked <0x00000000d5588a40> (a java.lang.ref.ReferenceQueue$Lock) > at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:165) > at > java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:216)"Reference > Handler" #2 daemon prio=10 os_prio=0 tid=0x00007f6378079800 nid=0xe5 in > Object.wait() [0x00007f6366f91000] > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:502) > at java.lang.ref.Reference.tryHandlePending(Reference.java:191) > - locked <0x00000000d5588bf8> (a java.lang.ref.Reference$Lock) > at > java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153)"main" #1 > prio=5 os_prio=0 tid=0x00007f637800c000 nid=0xe3 runnable [0x00007f637e913000] > java.lang.Thread.State: RUNNABLE > at java.net.PlainSocketImpl.socketAccept(Native Method) > at > java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409) > at java.net.ServerSocket.implAccept(ServerSocket.java:560) > at > sun.security.ssl.SSLServerSocketImpl.accept(SSLServerSocketImpl.java:348) > at > org.apache.ranger.authentication.UnixAuthenticationService.startService(UnixAuthenticationService.java:301) > at > org.apache.ranger.authentication.UnixAuthenticationService.run(UnixAuthenticationService.java:114) > at > org.apache.ranger.authentication.UnixAuthenticationService.main(UnixAuthenticationService.java:99)"VM > Thread" os_prio=0 tid=0x00007f6378070000 nid=0xe4 runnable"VM Periodic Task > Thread" os_prio=0 tid=0x00007f63780b7000 nid=0xeb waiting on conditionJNI > global references: 244{noformat} > Basically the end up here - > [https://github.com/apache/ranger/blob/master/ugsync/src/main/java/org/apache/ranger/usergroupsync/UserGroupSync.java#L60] > > and never reach the next while loop. > [https://github.com/apache/ranger/blob/master/ugsync/src/main/java/org/apache/ranger/usergroupsync/UserGroupSync.java#L72] > > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)