Hi,

I though I had already posted results for both without cache remote, apologizes 
for missing that... I also timed (with chapels timer class) the actual 
computing loops, I've posted average of five runs. All runs were were with 
GASNET_SPAWNF=L and two locales.


Standard block dist, with no flags. Average time: 0.84 sec

(get = 4236, get_nb = 0, put = 0, put_nb = 0, test_nb = 0, wait_nb = 0, try_nb 
= 0, fork = 0, fork_fast = 0, fork_nb = 1602) (get = 823, get_nb = 0, put = 0, 
put_nb = 0, test_nb = 0, wait_nb = 0, try_nb = 0, fork = 1608, fork_fast = 0, 
fork_nb = 0)


My custom dist, with no flags. Average time: 1.46 sec

(get = 4236, get_nb = 0, put = 0, put_nb = 0, test_nb = 0, wait_nb = 0, try_nb 
= 0, fork = 0, fork_fast = 0, fork_nb = 1602) (get = 8039, get_nb = 0, put = 0, 
put_nb = 0, test_nb = 0, wait_nb = 0, try_nb = 0, fork = 1608, fork_fast = 0, 
fork_nb = 0)



Standard block dist, with --no-local --fast. Average time: 0.68 sec

(get = 4236, get_nb = 0, put = 0, put_nb = 0, test_nb = 0, wait_nb = 0, try_nb 
= 0, fork = 0, fork_fast = 0, fork_nb = 1602) (get = 823, get_nb = 0, put = 0, 
put_nb = 0, test_nb = 0, wait_nb = 0, try_nb = 0, fork = 1608, fork_fast = 0, 
fork_nb = 0)


My custom dist, with --no-local --fast. Average time: 0.95 sec

(get = 4236, get_nb = 0, put = 0, put_nb = 0, test_nb = 0, wait_nb = 0, try_nb 
= 0, fork = 0, fork_fast = 0, fork_nb = 1602) (get = 5633, get_nb = 0, put = 0, 
put_nb = 0, test_nb = 0, wait_nb = 0, try_nb = 0, fork = 1608, fork_fast = 0, 
fork_nb = 0)



I was a bit surprised about how much faster standard block dist was, because in 
my previous tests the overhead was significantly smaller. Is it possible that 
there have been some changes in chapel git (during last month) which could be 
causing this?

I guess it would be simplest to go back to block dist and start building my own 
dist from it step by step while monitoring performance and comm stats. After 
all, modifications I made were by no means extensive, most of the time was 
spent investigating the existing code.


Another question, is it possible to use config param to determine which modules 
to use? I tried normal conditionals, but that gives me invalid use of module 
-error.

Also, I have a constructor defined as

proc MyPolicy(boundingBox: domain, param rank=boundingBox.rank, type 
idxType=boundingBox.idxType, targetLocs: [] locale = Locales, relCutTable: [] 
real)

How could I have relCutTable's default value to be an array filled with 1.0s, 
with same domain as targetLocs?


20.02.2015, 03:03, "Brad Chamberlain" <[email protected]>:
> Hi John --
>
> The comparison that makes the most sense to me are these two:
>>  Standard block dist, with --fast --cache-remote --no-local :
>>
>>  (get = 0, get_nb = 4222, put = 0, put_nb = 0, test_nb = 0, wait_nb = 0,
>>  try_nb = 0, fork = 0, fork_fast = 0, fork_nb = 1602) (get = 0, get_nb = 810,
>>  put = 0, put_nb = 0, test_nb = 0, wait_nb = 0, try_nb = 0, fork = 1604,
>>  fork_fast = 0, fork_nb = 0)
>>  My custom dist with --no-local --fast
>>
>>  (get = 4232, get_nb = 0, put = 0, put_nb = 0, test_nb = 0, wait_nb = 0,
>>  try_nb = 0, fork = 0, fork_fast = 0, fork_nb = 1602) (get = 9629, get_nb =
>>  0, put = 800, put_nb = 0, test_nb = 0, wait_nb = 0, try_nb = 0, fork = 1604,
>>  fork_fast = 0, fork_nb = 0)
>
> Though to make it apples-to-apples, I think you should run both without
> --cache-remote (or both with?).  Specifically, I suspect that the
> conversion of 4222 non-blocking gets (get_nb) in the first case to 4232
> blocking gets (get) in the second is due to this flag.  Otherwise the
> first locale's comm statistics look pretty similar between the two, so
> there don't seem to be any real surprises there.
>
> The second locale's comm results are pretty weird, though: ~800 gets are
> becoming ~9000 gets and 800 puts.  At a glance, this suggests to me that
> something that ought to be on locale #1 is actually on locale #0.  The
> fact that the number of puts in the second case is equal to the number of
> gets in the first seems particularly suspcious.  While it may be that
> --cache-remote is playing a role here, it may also simply be that
> something isn't stored where you'd expect (or getting optimized as you'd
> expect).  But running both versions in a similar --cache-remote mode would
> remove that question mark.
>
> I know the original authoring of the Block routine took a number of
> simple-but-ugly steps to localize data that the compiler wouldn't do
> automatically (most of which, one might expect it ultimately to do).  I
> haven't reviewed those tricks in years to determine whether they are still
> necessary, but it could be that a seemingly innocuous software engineering
> refactoring would result in some meta-data being remote rather than local.
>
> Unfortunately, there aren't any high-level tools to help determine this,
> so the tricks we usually take are to put the narrow the calipers on the
> communication count routines and study smaller and smaller sections of
> code; or to put things like "writeln(x.locale)" or "writeln(here)" or
> "assert (x.locale == here)" or "writeln(this.locale)" into the key
> routines (like dsiAccess or the 'these' iterators) to make sure that a
> given variable is stored whwere we'd expect, or that we're running where
> we'd expect, or that the two locations are the same.
>
> -Brad

------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk
_______________________________________________
Chapel-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-users

Reply via email to