Okay, I've traced this down. The problem is that a DSS-internal function has
been exposed via the API, so now people can mistakenly call the wrong one.
You should -never- be using opal_dss.pack_buffer or opal_dss.unpack_buffer.
Those were supposed to be internal to the DSS only, and will definitely mess
you up if called directly.

I'll fix this problem to avoid future issues. There is a comment in dss.h
that warns you never to call those functions, but who would remember?

I sure wouldn't. I've only avoided the problem because of ignorance - I
didn't know those API's existed!

Should have a fix in later today.
Ralph



On 6/19/08 8:43 AM, "Ralph H Castain" <r...@lanl.gov> wrote:

> WOW! Somebody really screwed up the DSS by adding some new API's I'd never
> heard of before, but really can cause the system to break!
> 
> I'm going to have to straighten this mess out - it is a total disaster.
> There needs to be just ONE way of packing and unpacking, not two totally
> incompatible methods.
> 
> Will let you know when it is fixed - probably early next week.
> Ralph
>  
> 
> 
> On 6/19/08 8:34 AM, "Leonardo Fialho" <lfia...@aomail.uab.es> wrote:
> 
>> Hi Ralph,
>> 
>> Mi mistake, I'm really using ORTE_PROC_MY_DAEMON->jobid.
>> 
>> I have success using pack_buffer()/unpack_buffer() and OPAL_BYTE type,
>> something strange occur when I was using pack()/unpack(). The value of
>> num_bytes increase, example:
>> I tried to read num_bytes=5, and after a unpack this var have 33! I
>> don't understand it...
>> 
>> Thanks,
>> Leonardo Fialho
>> 
>> Ralph Castain escribió:
>>> 
>>> On 6/17/08 3:35 PM, "Leonardo Fialho" <lfia...@aomail.uab.es> wrote:
>>> 
>>>   
>>>> Hi Ralph,
>>>> 
>>>> 1) Yes, I'm using ORTE_RML_TAG_DAEMON with a new "command" that I
>>>> defined in "odls_types.h".
>>>> 2) I'm packing and unpacking variables like OPAL_INT, OPAL_SIZE, ...
>>>> 3) I'm not blocking the "process_commands" function with long code.
>>>> 4) To know the daemon's vpid and jobid I used the same jobid from the
>>>> app (in this solution, I can be changed) and the vpid is ordered
>>>> sequentially (0 for mpirun and 1 to N for the orted's).
>>>>     
>>> 
>>> The jobid of the daemons is different from the jobid of the apps. So at the
>>> moment, you are actually sending the message to another app!
>>> 
>>> You can find the jobid of the daemons by extracting it as
>>> ORTE_PROC_MY_DAEMON->jobid. Please note, though, that the app has no
>>> knowledge of the contact info for that daemon, so this message will have to
>>> route through the local daemon. Happens transparently, but just wanted to be
>>> clear as to how this is working.
>>> 
>>>   
>>>> The problems is: I need to send a buffered data, and I don't know the
>>>> type of this data. I'm trying to use OPAL_NULL and OPAL_DATA_VALUE to
>>>> send it but I got no success.... :(
>>>>     
>>> 
>>> If I recall correctly, you were trying to archive messages that flowed
>>> through the PML - correct? I would suggest just treating them as bytes and
>>> packing them as an opal_byte_object_t, something like this:
>>> 
>>> opal_byte_object_t bo;
>>> 
>>> bo.size = sizeof(my-data);
>>> bo.data = *my_data;
>>> 
>>> opal_dss.pack(*buffer, &bo, 1, OPAL_BYTE_OBJECT);
>>>  
>>> Then on the other end:
>>> 
>>> opal_byte_object_t *bo;
>>> int32_t n;
>>> 
>>> opal_dss.unpack(*buffer, &bo, &n, OPAL_BYTE_OBJECT);
>>> 
>>> You can then transfer the data into whatever storage you like. All this does
>>> is pass the #bytes and the bytes as a collected unit - you could, of course,
>>> simply pass the #bytes and bytes with independent packs if you wanted:
>>> 
>>> int32_t num_bytes;
>>> uint8_t *my_data;
>>> 
>>> opal_dss.pack(*buffer, &num_bytes, 1, OPAL_INT32);
>>> opal_dss.pack(*buffer, my-data, num_bytes, OPAL_BYTE);
>>> 
>>> ...
>>> 
>>> opal_dss.unpack(*buffer, &num_bytes, &n, OPAL_INT32);
>>> my_data = (uint8_t*)malloc(num_bytes);
>>> opal_dss.unpack(*buffer, &my_data, &num_bytes, OPAL_BYTE);
>>> 
>>> 
>>> Up to you.
>>> 
>>> Hope that helps
>>> Ralph
>>> 
>>>   
>>>> Thanks in advance,
>>>> Leonardo Fialho
>>>> 
>>>> 
>>>> Ralph H Castain escribió:
>>>>     
>>>>> I'm not sure exactly how you are trying to do this, but the usual
>>>>> procedure
>>>>> would be:
>>>>> 
>>>>> 1. call opal_dss.pack(*buffer, *data, #data, data_type) for each thing you
>>>>> want to put in the buffer. So you might call this to pack a string:
>>>>> 
>>>>> opal_dss.pack(*buffer, &string, 1, OPAL_STRING);
>>>>> 
>>>>> 2. once you have everything packed into the buffer, you send the buffer
>>>>> with
>>>>> 
>>>>> orte_rml.send_buffer(*dest, *buffer, dest_tag, 0);
>>>>> 
>>>>> What you will need is a tag that the daemon is listening on that won't
>>>>> interfere with its normal operations - i.e., what you send won't get held
>>>>> forever waiting to get serviced, and your servicing won't block us from
>>>>> responding to a ctrl-c. You can probably use ORTE_RML_TAG_DAEMON, but you
>>>>> need to ensure you don't block anything.
>>>>> 
>>>>> BTW: how is the app figuring out the name of the remote daemon? The proc
>>>>> will have access to the daemon's vpid (assuming it knows the nodename
>>>>> where
>>>>> the daemon is running) in the ESS, but not the jobid - I assume you are
>>>>> using some method to compute the daemon jobid from the apps?
>>>>> 
>>>>> 
>>>>> On 6/17/08 12:08 PM, "Leonardo Fialho" <lfia...@aomail.uab.es> wrote:
>>>>> 
>>>>>   
>>>>>       
>>>>>> Hi All,
>>>>>> 
>>>>>> I´m using RML to send log messages from a PML to a ORTE daemon (located
>>>>>> in another node). I got success sending the message header, but now I
>>>>>> need to send the message data (buffer). How can I do it? The problem is
>>>>>> what data type I need to use for packing/unpacking? I tried
>>>>>> OPAL_DATA_VALUE but don´t get success...
>>>>>> 
>>>>>> Thanks,
>>>>>>     
>>>>>>         
>>>>> 
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> de...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>   
>>>>>       
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>   
>> 
> 
> 
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



Reply via email to