Hi all,

Thought I would share some more about what I am looking at

I basically have a code something like this:

for ( i=0; i<nPixels; i++ )   {
    // block of code for image processing
}

Now I profile the <block_of_code_for_image_processing> separately
(that is what is the per-pixel profile info) and also the whole loop.
That is, I have something like this

/////////////////////////////////////////////
//            PROFILE 1            //
/////////////////////////////////////////////

t0 = now_ms();
<block_of_code_inside_loop>
t1 = now_ms();
time_c = t1 - t0;

/////////////////////////////////////////////
//            PROFILE 2            //
/////////////////////////////////////////////

t0 = now_ms();
for ( i=0; i<nPixels; i++ ) {
   // block of code
}
t1 = now_ms();
time_c = t1 - t0;

now the time for PROFILE 1 is around X us. Multiplying X by nPixels I
get a certain number (actually about 12 secs).

and the time for PROFILE 2 is way above nPixels * X .. the actual
number is something like 48 secs.

So obviously (?) I came to conclusion that the difference is because
of the processor stalls due to cache misses, page swaps, etc. between
the successive loop execution.

Now typically image processing operations would be memory intensive.
What I am trying to find out is, is there a way to design efficient
code so as to minimize memory stalls. That is, when on embedded
processors, you can use DMA, word-alignment, etc. to optimize on the
memory performance. Obviously, that is not possible here. But is there
something which may be (relatively) sub-optimal, but still better than
naive C coding (or coding for desktop computing)?

I have already done some code refactoring to eliminate redundant
computations, try to minimize array accesses and other standard
techniques -- like splitting loops based on (assumed) memory access
patterns, keeping dynamic memory allocation to *absolute minimum* and
splitting loops to favor row based operations (over naive i = 0 ->
nPixels kind of implementations) for ensuring spatial locality.

Am I asking for too much? (I mean asking for too much with very
primitive support being available). Anyone with any experience
developing/porting image processing algorithms to embedded hardware in
general (like digital cameras, which would have similar constraints).

Thanks a lot,
Amit


On Aug 19, 8:12 pm, Amit <[email protected]> wrote:
> Hi,
>
> Thanks for the response.
>
> I apologize if I was vague or confusing.
>
> My question actually was on two issues, and I guess I mixed them up.
>
> The FIRST is of course code performance. And there in the comments
> about efficacy of native code at improving performance, etc.
>
> But the SECOND, and more important thing that I am trying to get a
> handle on, is about *memory related performance issues*. What I mean
> by that is, that as on any embedded system, processor stalls due to
> cache misses, page swapping would play a big role in performance. That
> is, let us say that a certain code (whether in Java or native code)
> takes a certain cycles on a cycle accurate simulator. And let us say
> that the simulator does not give us memory stalls. The difference
> between the profile on such a cycle-accurate simulator and actual
> profile (on the embedded system) would be the memory stalls.
>
> My question is if I can find resources that help me to optimize for
> memory. That is, are there certain 'best practices' when it comes to
> developing applications for image processing algorithms on Android
> (whether in Java or native code)?
>
> On Aug 19, 5:54 pm, Fabrizio Giudici <[email protected]>
> wrote:
>
>
>
>
>
>
>
> > -----BEGIN PGP SIGNED MESSAGE-----
> > Hash: SHA1
>
> > On 8/19/10 14:33 , Amit wrote:
>
> > > Well yes, I only meant that just the fact of using native code
> > > (over Java) won't be very effective. At least that is the
> > > impression I have (which may be wrong).
>
> > > Considering the fact that even native code ultimately runs inside
> > > the Dalvik VM instance, performance gains from use of native code
> > > would be modest, right?
>
> > Things are a bit different. As far as I understand, applications in
> > general only run inside the Dalvik VM - which means that e.g.
> > activities, boot code etc... is bytecode. In other words, a 100%
> > native app can't exist in Android. But the NDK allows you to create
> > portions of native code that are called by the app. That is, a flow of
> > operations is always started by the VM, but your native code gets
> > executed directly on the processor. This is more or less the same that
> > happens with JNI in the regular Java JDK.
>
> > Given that, before moving to native code I'd wait for others to share
> > with you their experience specifically with image processing.
>
> > PS It's a shame that Google dropped some imaging back-end classes from
> > Harmony, as there are a number of powerful and complete imaging
> > libraries in Java such as JAI.
>
> > - --
> > Fabrizio Giudici - Java Architect, Project Manager
> > Tidalwave s.a.s. - "We make Java work. Everywhere."
> > java.net/blog/fabriziogiudici -www.tidalwave.it/people
> > [email protected]
> > -----BEGIN PGP SIGNATURE-----
> > Version: GnuPG/MacGPG2 v2.0.14 (Darwin)
> > Comment: Using GnuPG with Mozilla -http://enigmail.mozdev.org/
>
> > iEYEARECAAYFAkxtKYAACgkQeDweFqgUGxdg6wCgpJM/beTx9U0thsO30tjNh0Mp
> > lOUAnRRBs/XxM9PutV+7KOh7CoLGehE8
> > =bXS+
> > -----END PGP SIGNATURE-----

-- 
You received this message because you are subscribed to the Google
Groups "Android Developers" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/android-developers?hl=en

Reply via email to