Le 27/01/15 23:07, Harris, Christopher P a écrit : > Hi, Emmanuel. > > "Can you tell us how you do that ? Ie, are you using a plain new connection > for each thread you spawn ?" > Sure. I can tell you how I am implementing a multi-threaded approach to read > all of LDAP/AD into memory. I'll do the next best thing...paste my code at > the end of my response. > > > "In any case, the TimeOut is the default LDapConnection timeout (30 seconds) > :" > Yes, I noticed mention of the default timeout in your User Guide. > > > "You have to set the LdapConnectionConfig timeout for all the created > connections to use it. there is a setTimeout() method for that which has been > added in 1.0.0-M28." > When visiting your site while seeking to explore connection pool options, I > noticed that you recently released M28 and fixed DIRAPI-217 and decided to > update my pom.xml to M28 and test out the PoolableLdapConnectionFactory. > Great job, btw. Keep up the good work! > > Oh, and your example needs to be updated to using > DefaultPoolableLdapConnectionFactory instead of PoolableLdapConnectionFactory. > > > "config.setTimeOut( whatever fits you );" > Very good to know. Thank you! > > > "It is the right way." > Sweeeeeeet! > > > "Side note : you may face various problems when pulling everything from an AD > server. Typically, the AD config might not let you pull more than > 1000 entries, as there is a hard limit you need to change on AD if you want > to get more entries. > > Otherwise, the approach - ie, using multiple threads - might seems good, but > the benefit is limited. Pulling entries from the server is fast, you should > be able to get tens of thousands per second with one single thread. I'm not > sure how AD support concurrent searches anyway. Last, not least, it's likely > that AD does not allow more than a certain number of concurrent threads to > run, which might lead to contention at some point." > > Ah, this is why I wanted to reach out to you guys. You guys know this kind > of in-depth information about LDAP and AD. So, I may adapt my code to a > single-thread then. I can live with that. I need to pull about 40k-60k > entries, so 10's of thousands of entries per second works for me. I may need > to run the code by you then if I go with a single-threaded approach and need > to check if I'm going about it in the most efficient manner.
The pb with the multi-threaded approach is that you *have* to know which entry has children, because they won't give you such an info. So you will end doing a search for every single entry you get at one level, with scope ONE_LEVEL, and most of the time, you will just get teh entry itself. That would more than double the time it takes to grab everything. > > > > And now time for some code... > > import java.io.IOException; > import java.util.Iterator; > import java.util.List; > import java.util.Map; > import java.util.concurrent.ConcurrentHashMap; > import java.util.concurrent.ExecutorService; > import java.util.concurrent.Executors; > import java.util.concurrent.TimeUnit; > import java.util.logging.Level; > import java.util.logging.Logger; > > import org.apache.commons.pool.impl.GenericObjectPool; > import org.apache.directory.api.ldap.model.cursor.CursorException; > import org.apache.directory.api.ldap.model.cursor.SearchCursor; > import org.apache.directory.api.ldap.model.entry.Entry; > import org.apache.directory.api.ldap.model.exception.LdapException; > import org.apache.directory.api.ldap.model.message.Response; > import org.apache.directory.api.ldap.model.message.SearchRequest; > import org.apache.directory.api.ldap.model.message.SearchRequestImpl; > import org.apache.directory.api.ldap.model.message.SearchResultEntry; > import org.apache.directory.api.ldap.model.message.SearchScope; > import org.apache.directory.api.ldap.model.name.Dn; > import org.apache.directory.ldap.client.api.DefaultLdapConnectionFactory; > import org.apache.directory.ldap.client.api.LdapConnection; > import org.apache.directory.ldap.client.api.LdapConnectionConfig; > import org.apache.directory.ldap.client.api.LdapConnectionPool; > import org.apache.directory.ldap.client.api.LdapNetworkConnection; > import > org.apache.directory.ldap.client.api.DefaultPoolableLdapConnectionFactory; > import > org.apache.directory.ldap.client.api.ValidatingPoolableLdapConnectionFactory; > import org.apache.directory.ldap.client.api.SearchCursorImpl; > import org.apache.directory.ldap.client.template.EntryMapper; > import org.apache.directory.ldap.client.template.LdapConnectionTemplate; > > /** > * @author Chris Harris > * > */ > public class LdapClient { > > public LdapClient() { > > } > > public Person searchLdapForCeo() { > return this.searchLdapUsingHybridApproach(ceoQuery); > } > > public Map<String, Person> buildLdapMap() { > SearchCursor cursor = new SearchCursorImpl(null, 300000, > TimeUnit.SECONDS); > LdapConnection connection = new LdapNetworkConnection(host, > port); > connection.setTimeOut(300000); > Entry entry = null; > > try { > connection.bind(dn, pwd); > > LdapClient.recursivelyGetLdapDirectReports(connection, cursor, entry, > ceoQuery); > System.out.println("Finished all Ldap Map > Builder threads..."); > } catch (LdapException ex) { > > Logger.getLogger(LdapClient.class.getName()).log(Level.SEVERE, null, ex); > } catch (CursorException ex) { > > Logger.getLogger(LdapClient.class.getName()).log(Level.SEVERE, null, ex); > } finally { > cursor.close(); > try { > connection.close(); > } catch (IOException ex) { > > Logger.getLogger(LdapClient.class.getName()).log(Level.SEVERE, null, ex); > } > } > > return concurrentPersonMap; > } > > private static Person recursivelyGetLdapDirectReports(LdapConnection > connection, SearchCursor cursor, Entry entry, String query) > throws CursorException { > Person p = null; > EntryMapper<Person> em = Person.getEntryMapper(); > > try { > SearchRequest sr = new SearchRequestImpl(); > sr.setBase(new Dn(searchBase)); > StringBuilder sb = new StringBuilder(query); > sr.setFilter(sb.toString()); > sr.setScope( SearchScope.SUBTREE ); Ahhhhh !!!! STOP !!! Ok, no need to go any further in your code. You are doing a SUBTREE search on *every single entry* you are pulling from the base. if you have 40 000 entries, you will do something like O( 40 000! ) (factorial) searches. No wonder why you get timeout... Imagine you have such a tree : root A1 B1 C1 C2 B2 C3 C4 A2 B3 C5 C6 B4 C7 C8 The search on root with pull A1, A2, B1, B2, B3, B4, C1..8 (14 entries -> 14 searches) Then the search on A1 will pull B1, C1, C2, B2, C3, C4 (6 entries -> 6 searches) Then the search on A2 will pull B3, C5, C6, B7, C8, C9 (6 entries -> 6 searches) Then the search on B1 will pull C1, C2 ( 2 entries -> 2 searches, *4 = 8 ... At the end, you have done 1 + 14 + 12 + 8 = 35 searches, when you have only 15 entries... If you want to see what your algorithm is doing, just do a search using a SearchScope.ONE_LEVEL instead. You will only do somehow O(40 000) searches, which is way less than what you are doing. But anyway, doing a search on the root with a SUBTREE scope will be way faster, because you will do only one single search.