You are right, during the reading of the configuration file we are bailing out if a rule doesn't start with a zero-length. While the selection logic (where I was looking) doesn't have such a restriction (it automatically selects the first rule), I consider that forcing the communicator-based rules to start with a rule for zero-length messages is solving all issues and provides an intuitive approach, one where the user has to cover the entire message spectrum.
George. On Wed, May 20, 2015 at 11:25 PM, Gilles Gouaillardet < gilles.gouaillar...@gmail.com> wrote: > George, > > i understand the logic (even if i still find it counter intuitive, but > this is an other story) > > if a rule for zero-sized messages is not needed, then there is a bug ... > if (!nms && MS) { > OPAL_OUTPUT((ompi_coll_tuned_stream,"All algorithms > must specify a rule for message size of zero upwards always first!\n")); > OPAL_OUTPUT((ompi_coll_tuned_stream,"Message size was > %lu for collective ID %d com rule %d msg rule %d at around line %d\n", MS, > CI, ncs, nms, fileline)); > goto on_file_error; > } > > Cheers, > > Gilles > > On Thu, May 21, 2015 at 12:04 PM, George Bosilca <bosi...@icl.utk.edu> > wrote: > >> Gilles, >> >> There is no need to define a rule for zero-sized messages, it is >> implicitly matched by the first rule. To be extremely pedantic the >> selection logic for the communicator size and message size are identical >> albeit written differently. Both start by selecting rule 0, and then >> working their way up in the corresponding sizes (communicator or messages), >> moving the matched rule until the condition fails (size < rule size). >> >> Hopefully this clarifies why in your example the 2 proc communicators are >> using the rule for 4. >> >> Using 0 as index for an algorithm selection redirect the decision to the >> default, hard-coded, coll_tuned decision function, allowing the dynamic >> rules to fall back to the predefined behavior. >> >> George. >> >> >> >> On Wed, May 20, 2015 at 8:10 PM, Gilles Gouaillardet <gil...@rist.or.jp> >> wrote: >> >>> George, >>> >>> first i'd like to amend my initial message. >>> i previously wrote the same algo is used to parse rules per communicator >>> size and per message size. >>> this is true, but i missed the part where it is mandatory to define a >>> rule for zero size message. >>> consequently, a given message is either in an interval or its size is >>> more or equal the size of the last rule for a given communicator. >>> >>> there is no such thing for communicator size. >>> for example, if the config file is >>> comm size 4 => rules A >>> comm size 8 => rules B >>> communicators of size 2, 4 and 6 will all use rule A. >>> this is very intuitive for comm size 4 and 6, but at first glance, comm >>> size 2 is in a grey area. >>> >>> an other option would be to force the rule file to have a rule for >>> communicators of size 0 (or 1 or two). >>> >>> bottom line, the rules must be sorted by comm size and message size by >>> design, and that looks fair to me. >>> however, there is a grey area for small communicators and i think it >>> should be cleared. >>> >>> Cheers, >>> >>> Gilles >>> >>> >>> On 5/21/2015 1:04 AM, George Bosilca wrote: >>> >>> Each rule define an interval with the previous rule, and everything in >>> an interval will be bound the the rule with the next message size. You >>> cannot define a rule for a specific amount. Thus, the fact that the rules >>> must be ordered by message size was done by design. >>> >>> Returning a NULL rule as suggested by Howard is even more confusing as >>> with this approach you don't even know what is used (as it will >>> automatically fall back to the default decision). >>> >>> George. >>> >>> >>> On Tue, May 19, 2015 at 11:57 PM, Howard Pritchard <hpprit...@gmail.com> >>> wrote: >>> >>>> HI Gilles, >>>> >>>> First a disclaimer - I do not know what the intended design was nor >>>> where the design document >>>> for this feature is located. >>>> >>>> However, I would certainly prefer that if the communicator size >>>> wasn't specifically specified >>>> in the rule file, a fall back do-no-harm algorithm would be selected. >>>> >>>> Following the KISS principal I would go with 2) returning a NULL rule >>>> when >>>> there is no matching size in the rule file for the communicator in >>>> question. >>>> >>>> Howard >>>> >>>> >>>> 2015-05-19 20:05 GMT-06:00 Gilles Gouaillardet <gil...@rist.or.jp>: >>>> >>>> Folks, >>>>> >>>>> this is a follow-up of a discussion on the user ML started at >>>>> http://www.open-mpi.org/community/lists/users/2015/05/26882.php >>>>> >>>>> 1) it turns out the dynamic rule filename must be "sorted" : >>>>> - rules must be sorted by communicator size >>>>> - within a given communicator size, rules must be sorted by message >>>>> size >>>>> >>>>> if not, some rules are silently skipped, which is counter intuitive >>>>> imho. >>>>> >>>>> >>>>> 2) the algo picks the rule with the higher communicator size less or >>>>> equal than the current communicator size (same thing for message size). >>>>> The exception is if there are no such rule, the first rule is selected. >>>>> for example, if the config file has rules for comm size 4, 8 and 16 >>>>> comm size 4 => pick rule for comm size 4 >>>>> comm size 5 => pick rule for comm 4 >>>>> comm size 8 => pick rule for comm 8 >>>>> *but* >>>>> comm size 2 => pick rule for comm size 4 (!) >>>>> imho, this is also counter intuitive. >>>>> i would have expected no rule is picked and the default behaviour is >>>>> used. >>>>> >>>>> Same thing applies for message sizes. >>>>> >>>>> Is this the intended design ? >>>>> >>>>> 1) can be solved by inserting some qsort calls after parsing the >>>>> config file. >>>>> 2) can be solved by returning a NULL rule instead of the first rule ( >>>>> or by automatically inserting a rule for comm size 0 (and message size 0) >>>>> if no such rule is present in the config file). >>>>> >>>>> any thoughts ? >>>>> >>>>> Cheers, >>>>> >>>>> Gilles >>>>> _______________________________________________ >>>>> devel mailing list >>>>> de...@open-mpi.org >>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>> Link to this post: >>>>> http://www.open-mpi.org/community/lists/devel/2015/05/17425.php >>>>> >>>> >>>> >>>> _______________________________________________ >>>> devel mailing list >>>> de...@open-mpi.org >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/devel/2015/05/17426.php >>>> >>> >>> >>> >>> _______________________________________________ >>> devel mailing listde...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >>> Link to this post: >>> http://www.open-mpi.org/community/lists/devel/2015/05/17433.php >>> >>> >>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> Link to this post: >>> http://www.open-mpi.org/community/lists/devel/2015/05/17438.php >>> >> >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2015/05/17439.php >> > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/05/17440.php >