I was playing around with some really silly fragment sizes (sub 72 bytes) when I ran into some asserts in the btl_openib_sendi. I traced the assert to be caused by mca_pml_ob1_send_request_start_btl() calculating the true eager_limit with the following line:

  size_t eager_limit = btl->btl_eager_limit - sizeof(mca_pml_ob1_hdr_t);

If btl_eager_limit ends up being less than the sizeof(mca_pml_ob1_hdr_t) the eager_limit calculated results in a very large number and an assert later on in the stack.

It seems to me that it would be nice to insert some checks in mca_btl_base_param_register() to make sure btl_eager_limit is > sizeof(mca_pml_ob1_hdr_t). Am I missing a reason why this was not done in the first place?

--td

Reply via email to