Hi Marc,

Thanks for your reply

I have done some optimization myself, but all of that has been on TI-
DSP processors, and TI has *very good* compiler tools (and also
general development tools). Now, question is, anyone aware of such
tools (or libraries developed) for Android?

As I can see from my code (cannot post that for obvious reasons), I'm
pretty certain the code is pretty ugly on the memory usage side. It's
really killing the processor :(
I'll share what I can: I am basically loading two images (in float)
and the result image is being filled in, so that is the equivalent of
4 * 3 = 12 byte images (assuming 4 bytes to a float, 4 bytes to an
int, etc -- standard type widths). So obviously I have not one, but
*three* images getting accessed, so cache misses and swaps are
happening. What I am looking to do now is to redesign the algorithm/
data structures to minimize the cache misses by favoring locality
patterns -- temporal & spatial. What I wanted to see was if someone
has sort-of "been-there-done-that" and could provide some pointers.

> Java tends to be allocate as you need memory model, but for
> cache optimizations you need tighter control of memory usage.

Exactly! How do I get that level of control while developing for
Android? With either Java or native C...

Thanks so much,
Amit


On Aug 20, 12:40 am, Marc <[email protected]> wrote:
> I have spent a lot of time optimizing signalprocessingalgorithms
> using both arm assembly and memory optimizations. The memory
> optimizations can be huge. I remember one project the speed more than
> doubled once we only using cache memory. That is the entire working
> data set was less than 8k.
>
> The process is fairly straight forward: Determine the size of your
> cache, then use only that much memory. This typically requires doing
> tricks like making the output buffer overlap the input buffer.
> Basically you need to know exactly where every byte of memory is being
> used. Java tends to be allocate as you need memory model, but for
> cache optimizations you need tighter control of memory usage.
>
> On Aug 19, 11:47 am, DanH <[email protected]> wrote:
>
>
>
>
>
>
>
> > "Hate to sound like I'm harping on the same stuff, but then (assuming
> > that the JVM/JIT compiler is doing good enough), the memory bottleneck
> > still remains."
>
> > Yep, much of our effort on iSeries went into the memory bottleneck
> > area.  Eg, we got fairly astounding improvements (ca 20%) when we
> > "packed" objects so that the fields of "SubclassOfA" filled in the
> > "holes" left from aligning the fields of "A".  And even more
> > improvement by packing the Char array owned by a String into the
> > String and arranging it so that the two shared a single header.
>
> > (BTW, with regard to alignment, note that most processors can handle,
> > eg, unaligned ints and longs, but often the storage accesses are
> > several times longer if unaligned, so alignment may be very important,
> > even if "unnecessary".)
>
> > On Aug 19, 12:54 pm, Amit <[email protected]> wrote:
>
> > > Hi Dan,
>
> > > Thanks for the response
>
> > > > In general, JITed Java code is as fast as or faster than the
> > > > equivalent native code, if the JIT is reasonably good, and if the
> > > > specific application can be coded efficiently in Java.  
>
> > > I was actually banking on this. I don't know too much of the hairy
> > > details (am not really a compiler person), but from what I have read
> > > recent improvements by Google to the Dalvik VM make it *comparable* if
> > > not equal in performance to native code ...
>
> > > Hate to sound like I'm harping on the same stuff, but then (assuming
> > > that the JVM/JIT compiler is doing good enough), the memory bottleneck
> > > still remains.
>
> > > Thanks,
> > > Amit
>
> > > On Aug 19, 10:11 pm, DanH <[email protected]> wrote:
>
> > > > In general, JITed Java code is as fast as or faster than the
> > > > equivalent native code, if the JIT is reasonably good, and if the
> > > > specific application can be coded efficiently in Java.  The problem is
> > > > that some specific dataprocessingpatterns are not easy to code
> > > > efficiently in Java, and I suspect that certain of the bit-bashing
> > > > algorithms used inimageprocessingfall into this category.
>
> > > > In such cases the most efficient approach is "native Java", but I only
> > > > know of one JVM (the IBM iSeries "classic" JVM) that permits this, and
> > > > then only for system code.  Otherwise it's a bit of a tradeoff to get
> > > > the right partitioning between Java and native, since crossing the
> > > > Java/native boundary tends to be relatively expensive.
>
> > > > On Aug 19, 7:03 am, Fabrizio Giudici <[email protected]>
> > > > wrote:
>
> > > > > -----BEGIN PGP SIGNED MESSAGE-----
> > > > > Hash: SHA1
>
> > > > > On 8/19/10 13:35 , Amit wrote:
>
> > > > > > Now, I know that native code will *not* yield any significant
> > > > > > performance improvement over Java code
>
> > > > > Well, specifically forimageprocessingthis won't be true, for sure
> > > > > up to 2.1 included (as the bytecode is purely interpreted); in 2.2 we
> > > > > have JIT, but can't speak as I haven't seen it yet.
>
> > > > > - --
> > > > > Fabrizio Giudici - Java Architect, Project Manager
> > > > > Tidalwave s.a.s. - "We make Java work. Everywhere."
> > > > > java.net/blog/fabriziogiudici -www.tidalwave.it/people
> > > > > [email protected]
> > > > > -----BEGIN PGP SIGNATURE-----
> > > > > Version: GnuPG/MacGPG2 v2.0.14 (Darwin)
> > > > > Comment: Using GnuPG with Mozilla -http://enigmail.mozdev.org/
>
> > > > > iEYEARECAAYFAkxtHakACgkQeDweFqgUGxe83wCfSDP1NEN+TLD0iOCZ/zSvQDRw
> > > > > I5cAoJOEoC7eREU5KuPU7m93/GDj9VUr
> > > > > =2ZDf
> > > > > -----END PGP SIGNATURE-----

-- 
You received this message because you are subscribed to the Google
Groups "Android Developers" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/android-developers?hl=en

Reply via email to