Re: [iText-questions] performance follow up

Mike Marchywka Fri, 23 Apr 2010 08:20:58 -0700


this is draft I mentioned earlier, it was getting a bit
convoluted due to over qualifiying each assertion
but if you are using append's a lot, consider the basic
idea of finding the delims FIRSt then doing one or
more array ops or avoiding string creating altogether.
I don't have any idea what you are doing with these 
strings you parse but if building dictionaries, consider
things like the following.
On large dictionaries with coherent access patterns
, hash tables may not be as efficient as sorted things
with the right indexing ( this may not be apparent until
you start VM thrashing but if you have ordered queries on
static dictionaries, a sparse hash can make a mess of a
cache compared to a well thought out b-search on 
a compact representation of your strings). I'm not
entirely sure the multi-pass approach I try to 
outline below has a lot of merit but you would
need to consider some issues along these lines.
 



________________________________
> From: brave...@gmail.com
> Date: Fri, 23 Apr 2010 12:27:42 +0200
> To: itext-questions@lists.sourceforge.net
> Subject: Re: [iText-questions] performance follow up
>
>
>
> Hello Paulo,
>
> On Apr 22, 2010, at 11:43 PM, Paulo Soares wrote:
> FYI I already use a table to map the char to the result for the delimiter 
> testing and the speed improvement was zero in relation to plain comparisons.
>
> Paulo
>
> You are right ... changing to a table makes no difference. I checked this 
> with the profiler and the results stay the same.
>

Why does that method take an int param vs char or better a byte?
Implicit casts are not normally free, probably look up table
needs to convert array index to int anyway but if you are
doing specific booleans comparing byte to byte you may be able
to avoid some JVM junk. In any case, the method code could
hide that if needed at all.

As should be clear, I'm not familiar with the code and
don't have it in from of me but a few thoughts.
Often reordering operations can help but it may not
be obvious a priori which approach is best.
Multiple passes are generally bad compared to working
on blocks that preserve locality and maximize low level
memory cache hits. However, due to other
issues it coud make sense, or at least multiple
passes in small blocks.
You could consider inlining this method in one place along
with any similar ones
and making a classification pass during which you scan each char in your
input data and create a class for it. Then make a second pass
through your now huge "data" in which each char is followed by its
class and then have processing based on a big switch statement
that switches on the class and whatever state info you have made.
Or, consider building a table of whitespace locations on your
first pass etc etc. If you are currently going through calling something
like an append(char) method on each char, you may be better off finding limits 
and creating a new string with String(byte[]. offset, length) etc.


Also, presumably you find token limits and then make strings,
it is possible to avoid creating strings at all and just pass
around indexes into a byte array? This may require massive code
changes all over and depending on what you do with the strings may or may not
help much as many common operations may be expected to be opimitzed
in native code for strings. However, If you have huge hash tables each look up 
may be cheap
to compute but each one also trashes the memory cache. You may be
better off with ordered index structs that you can implement in java
with byte[] more easily than strings.
And, of course, don't ignore obvious data dependent optimizations.
If you have strings with long common prefix like, http://www then removing this 
from compares could be a big help with memory and speed.



> Best regards,
> Giovanni                                        
_________________________________________________________________
Hotmail has tools for the New Busy. Search, chat and e-mail from your inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_1
------------------------------------------------------------------------------
_______________________________________________
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.itextpdf.com/book/
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] performance follow up

Reply via email to