On Sun, Jun 22, 2014 at 8:56 PM, Emmanuel Lécharny <[email protected]> wrote:
> Hi Kiran, > > I did a bit of profiling today, and was able to improve the perfs by 7%. > The method I speeded up is PrepareString. I created a specific method > which does not crerate a new char[] when we are dealing with ASCII chars > only. The gain is huge. > great, can you commit it? > > Otherwise, most of the time is -as expected- spent in the > deserialization of entries read from the MasterTable. > > ok > At this point, I think we should think about what we can do to avoid > such cost. Most of the time, we will have enough memory to load all the > elements that will be stored into an index. I'm wondering if it would > not be better to parse the LDIF once, gather what we can in memory (but > not keeping the whole entry in memory) and build the index directly, > then process the master table. > > hmm, at least at one point we end up with keeping full entry > It's not easy, because we can't know how much elements we can store in > yeah > memory, and when we reach the memory limit, then we have to do something > which is completely different. If we decide to deal with the memory > limitation from the beginning, we will pay the price and it will be > expensive. OTOH, most of the time we won't have to care about the memory > yep > for two reasons : > - either we have to deal with a limited number of entries in the ldif file > - or we have enough memory to handle the whole file (on my computer, I > can provide 14Gb to the JVM, enough to process 5M entries if each one of > them is 1kb large) > > I'm now thinking that it would be better to have 2 possible algorithm : > - a in-memory one, which does not care aboyt what could happen when we > reach the end of the memory > - a 'smarter' one which take control when we get an OOM > > +1 > This can be done the same way we do with the DN parser : we have a fast > parser, which throw an exception if it sees a special case, and a full > parser. Same here, but we catch the OOM instead. > > Of course, we cna probably try to 'predict' which one to use when we > start the bulk load, to avoid spending time with the in-memory process. > Or we can let the user decide. > > Wdyt ? > yep, been thinking about the earlier ideas as well, but for now just moved the bulkloader to its own module -- Kiran Ayyagari http://keydap.com
