Hi Jeff,

Thanks for the excellent review.    :-)

To keep things in context I will try to embed my responses below...


On 6/3/2010 4:09 PM, Jeff Epler wrote:
> I don't have any real modbus hardware to test this on, but if there are
> problems with the classicladder modbus function I'd like to get the fix
> into a future 2.4.x release.  Since classicladder is Chris Morley's
> project, I feel that it's primarily up to him what patch should
> ultimately go in.
>
> Chris or anyone else, do you have hardware to test this on, particularly
> hardware from a different vendor than the Siemens PLC that Dave tested?
> I have an Arduino with FreeMODBUS firmware and may be able to
> temporarily use an Automation Direct VFD.  (I was able to use function 4
> on the Arduino in classicladder before Dave's modifications; I haven't
> tried with the patch, or any functions besides 4)
>
> Before I get in to my remarks about the proposed patch, I want to thank
> Dave for doing the work and sharing the results of that work.  He fixes
> important bugs and I want to make a big deal out of things like where he
> adds spaces!  But code organization really is important -- probably the
> idiosyncratic way that much of classicladder is written was tough for
> Dave to understand the first time (I know it has been for me as I read
> the existing code while reviewing this patch).

You are very welcome....  this is the least I can do considering all of 
the help you guys have provided..

> We can improve this over
> time, but only by holding new submissions to a higher standard than is
> seen in the existing code.
>
>    

I agree entirely.   I had to resist the temptation to start 
rewriting/cleaning up other parts of the code as I found a lot of it 
very hard to read - and for no good reason.
I really didn't know where to stop, but I decided to try and not make 
any unnecessary changes .. although I didn't succeed in doing that 
entirely of course...


>> Because I removed all hard coded timers, the Modbus Com thread can now
>> put significant load on the CPU if run without either a delay after
>> transmit or an inter transaction delay.
>>      
> This is particularly worrisome to me.  This means you're busy-waiting,
> and you've identified the consequence of it.  I didn't see what part of
> your patch busy-waits but it must be there.
>
>    

I don't think it is really busy waiting - as in doing a empty while loop 
to wait.  Unless there is work to do, the while loop enters a 
DoPauseMilliSecs pause, which calls a nanosleep function which suspends 
the thread which is running the SocketModbusMasterLoop function .    
That is the near to last function in socket_modbus_master.c.

What I believe I saw in the original code was a pause of 10 ms every 
scan of the Modbus loop function regardless if there was work to be done 
or not.  That made no sense to me.   If there is work to be done, I 
don't think I want the thread to be put to sleep.

> More notes about specific parts of the proposed change:
>
>    
>> +/* Modified by Dave Cole 5/24/10 to increase serial receive reliability,
>> +   correct errors in EMC2 implementation version*/
>>      
> There's no need to add comments like these.  The git history will
> record who made the change and the reason given for the change.
> Repeating it in the source file is unhelpful.  (lots of my notes, such
> as this one, apply in multiple places)
>
>    

No problem..  I didn't know how changes like this were managed.

>> @@ -40,8 +43,6 @@
>>   #endif
>>   GtkWidget *LabelParam[ NBR_OBJECTS ],*ValueParam[ NBR_OBJECTS ];
>>
>> -
>> -
>>   #define NBR_IO_PARAMS 6
>>   GtkWidget *InputParamEntry[ NBR_INPUTS_CONF ][ NBR_IO_PARAMS ];
>>   GtkWidget *InputDeviceParam[ NBR_INPUTS_CONF ];
>>      
> Try to avoid making unrelated whitespace changes
>
>    

Question:   As you are poking through code like this to try and 
determine what to do, making changes, trying things out .... how do you 
not disturb the whitespace?

I tend to reorganize the whitespace to make the program clearer for 
me....  even if it is not standard convention.   But by doing that it is 
near impossible to put the whitespace back to where it was ...

Do you use one Git clone directory for the modifications, then go back 
and splice your changes into a virgin git clone directory before doing a 
patch?

It sure would be nice if the patch creation process was more 
intelligent.   But apparently it is not, and I really don't want to 
create patch noise due to the way I work.  :-)


>> -static char * SerialSpeed[] = { "300", "600", "1200", "2400", "4800", 
>> "9600", "19200", "38400", "57600", "115200", NULL };
>> +static char * SerialSpeed[] = { /*"300",*/ "600", "1200", "2400", "4800", 
>> "9600", "19200", "38400", "57600", "115200", NULL }; // Dave Cole
>>      
> Unless we're nearly certain no real users use 300 baud (i.e., because it
> never could have worked before (why?)), we should leave it in.
>
>    

I'd be pretty confident that no one is using EMC2 with a 300 baud Modbus 
connection....  but then again I might be wrong!  ?     I'm having a 
hard time understanding how 300 baud would be useful.  If we need to 
support 300 baud then shouldn't we also support 110 baud?  :-)   
Honestly, I have no idea why 300 baud did not work.  It may not be 
functional in the original code either.  When I tested it at 300 baud, 
errors occurred that I did not see at any other baud rates.  I thought 
it would be best to simply take out the option.  Personally I can't see 
Modbus being useful below 1200 baud.  But if you want me to look at it 
closer I will.


>    
>> -                            sprintf( BuffLabel, "After transmit pause - 
>> milliseconds" );
>> +                            sprintf( BuffLabel, "Dwell between Xmt and 
>> Rcv-millisecs" );
>>      
> I don't think these wording changes are improvements.  "pause" and
> "delay" are better than "dwell", and the original didn't have to
> abbreviate "Xmt" and "Rcv".
>    

This verbiage change is not really important to me one way or the 
other.   I didn't think the descriptions were accurate.  A pause after 
receive doesn't really describe a delay between modbus transactions.
And a pause after transmit really doesn't describe a pause between 
transmit and a receive within a transaction..   No big deal either 
way.   My goal was to get Modbus to run out of the box without "tuning" 
these timers.
I'd be happy to revert these text changes..

>    
>> -            MessageInStatusBar("Openned Configuration window. Press again 
>> to update changes and close");
>> +            MessageInStatusBar("Opened Configuration window. Press again to 
>> update changes and close");
>>      
> This is an obvious typo fix which is fine but even better as a separate
> commit.
>
>    

Ok.   There are a lot of typos in this code, probably due to language 
differences.    Many of the function names also have typos, which makes 
changing the code a little more difficult if you aren't cutting and 
pasting.
Should typos and code reformatting normally be a separate commit/patch?

>> -            LgtResp++;//for function code
>> +            //  LgtResp++;//for function code  where does LgtResp come 
>> from????
>>      
> Delete code, don't comment it out.
>    

I totally agree this is not right..
Again this goes back to how do you guys keep your hacked code fixes 
separate from the code you create your patches from?

> Don't add comments that mostly reflect that the programmer didn't
> understand the code, such as 'where does LgtResp come from????'.
>    

Yes... again junk I left in the code...   notes for myself that 
shouldn't be in the patch file..

> Incidentally, it was declared just above at the top of the function and
> initialized to zero:
> |        int LgtResp = 0, NbrRealBytes;
> but like you I find the manner in which the final value of LgtResp is
> computed to be hard to love.
>
>    

Jeff.....  you are being nice .. ;-)

>>              switch( CurrentFuncCode )
>>              {
>>      
> [changes snipped]
>
> It was hard for me to see what changes you intended here.  I wrote a
> test program which incorporates the previous definition of this
> function, your modified version, and a function I wrote from scratch
> based on reading a modbus protocol documentation.
>
> This table summarises the functions for values where they differ (27
> cases in all were tested and the rest agreed):
>
> func size    orig dave jeff
> 0x05    1 ->     6    5    5  # FORCE_COIL
>
> 0x0f    1 ->     6    2    5  # FORCE_COILS
> 0x0f    7 ->     6    2    5
> 0x0f    8 ->     6    2    5
> 0x0f   15 ->     6    2    5
>
> 0x06    1 ->     6    5    5  # WRITE_REG
>
> 0x10    1 ->     6    5    5  # WRITE_REGS
> 0x10    7 ->     6    5    5
> 0x10    8 ->     6    5    5
> 0x10   15 ->     6    5    5
>
> 0x08    1 ->     6    4   -1  # DIAGNOSTICS
>
> So in most cases your new code agrees with mine (0x5, 0x6, 0x10).
> We disagree on 0xf and 0x8.
>
> First, 0xf, Write Multiple Coils AKA Force Coils.  According
> to the documentation I am reading, the Response PDU consists of
>      1 byte  - function code (0xf)
>      2 bytes - starting address
>      2 bytes - quantity of outputs
> giving a total of 5 bytes.  How did you arrive at 2 bytes?  Did your
> testing cover any MODBUS_FC_FORCE_COILS requests?
>
>    

I'll have to go back and look at this again .... but I must admit 
laziness..   The way that I found out which functions where not creating 
the correct return byte count was by running each Modbus function in 
Classic Ladder against a Modbus simulator program running
in a different PC.   The Modbus simulator program displayed each message 
send and received, would flag bad messages received  etc.  When it did 
get a good request, it would also display it's response so when the 
Classic Ladder Modbus code faulted - I went and counted the length of 
the response bytes sent by the simulator and correct the Classic Ladder 
C code to the proper byte count.    I was assuming that the Modbus 
simulator software was working properly.    The software is quite 
popular (and open source) so I really didn't have any reason to believe 
that it was buggy.

Regarding 0x0f, you are right.   I looked at the specs.  There is 
something wrong there ....  I can mockup that test again easily to 
resolve that...  perhaps there is a bug in the Modbus simulator code for 
Force coil writes?    I can check that on the Siemens PLC also, I just 
need to alter the PLC program to allow it..  But that should be done.

I tested all Classic Ladder Modbus functions (except the 0x8 Diagnostic 
function)   against the Windows based open source simulator software 
running on a different PC at all baud rates from 300 to 115K baud..  I 
was using a null modem cable and a USB to RS232 converter on the Windows 
PC end.  I had recurrent problems at 300 baud.  I tested the register 
read and write functions with the Siemens S7-200.

The test connections looked like this:

Modbus simulator PC ----- USB to RS232 converter --- Null modem cable 
---- EMC2 system running 2.4 on Ubuntu 9.10 with onboard RS232

S7-226 Siemens PLC ---- RS485 to RS232 converter  -----   EMC2 system 
running 2.4 on Ubuntu 9.10 with onboard RS232

> Next, 0x8, Diagnostics.  According to the documentation I am reading,
> the response consists of
>      1 byte  - function code (0x8)
>      2 bytes - subfunction code
>      ? bytes - data (depends on subfunction and data in request)
> therefore, I don't believe it is possible to calculate the number of
> bytes in a diagnostics reply PDU based only on the function code and
> number of elements.  I also don't see an opportunity to send a
> Diagnostics request in classicladder.  Did your testing cover any
> MODBUS_FC_DIAGNOSTICS requests?
>
>    
>> +                            case MODBUS_FC_FORCE_COILS:  //14
>>      
> Don't add comments that purport to give the value of a constant or
> enumerant.  It's even worse when the comment is incorrect, as it is
> here:
>      protocol_modbus_master.h:#define MODBUS_FC_FORCE_COILS 15
>
>    

Ok

- FindNextReqFromTable( );
>> +                    FindNextReqFromTable( );
>>      
> these two lines differ because the "+" line has added whitespace.  Try
> to avoid making whitespace changes, because they make diffs noisy.
>
>    
>> @@ -193,46 +201,147 @@ void SerialSend( char *Buff, int BuffLength )
>>   void SerialSetResponseSize( int Size, int TimeOutResp )
>>   {
>>      if ( PortIsOpened )
>> -    {
>> -            newtio.c_cc[VMIN] = Size; //Nbr chars we should receive;
>> -            newtio.c_cc[VTIME] = TimeOutResp/100; // TimeOut in 0.1s
>> -//          tcflush(fd, TCIFLUSH);
>> +    {         // "fd" (file device) control for serial port per termios docs
>> +        newtio.c_cc[VMIN] = Size; //Nbr chars we should receive  (This 
>> doesn't seem to work, read returns immediately, regardless)
>> +            newtio.c_cc[VTIME] = TimeOutResp/100; // TimeOut in 0.1s (Read 
>> returns immediately and doesn't block as it should? )
>>              if ( ModbusDebugLevel>=2 )
>>                      printf("Serial config...\n");
>> -            tcsetattr(fd,TCSANOW,&newtio);
>> +            tcsetattr(fd,TCSANOW,&newtio);// TCSANOW - means make changes 
>> immediately
>>      }
>>   }
>>      
> Did you ever look at the value actually stored in VTIME as I suggested?
>    

I did not actually interrogate VTIME after it was set, I simply did a 
print just prior to setting the value.

> One thing that occurs to me as I read it now is that for TimeOutResp
> less than 100 it's setting VTIME to 0.  (TimeOutResp+99)/100 will round
> up instead.
>
> The other reason for the old code to fail would have been bad (too
> small) lengths calculated in GetModbusResponseLenghtToReceive but in all
> the cases I examined the original code returned the same value or a
> longer value than the new code, so this is probably a red herring.
>    

Jeff,  The interesting/disturbing thing I found out about VTIME and 
VMIN, is that although I could find docs that said how it should would 
work in "non-canonical" mode using VTIME and VMIN but I could never find 
an actual code example
where they didn't use some loop to scavenge the code out of the input 
buffer.    If the non-canonical mode of the serial read in termios is 
functioning properly then I would think that simply setting up a read 
with VTIME and VMIN would be the obvious and simpler
solution, yet I don't find that being implemented in what I looked at. 
    If VTIME is zero, then the timer is not functional so the read is 
suppose to block forever until VMIN characters are received.  I tried 
some experiments and I could never get the read to block for any number 
of characters.   The read would always return immediately with whatever 
characters it could scavenge out of the input buffer.     Not using a 
while loop and using the read with the VMIN and VTIME would be a lot 
more elegant solution, but I can't get it to work.   The other Modbus 
library that was used for the GS2 Modbus connection, uses the same while 
loop scheme that is implemented in this patch.

>    
>> +int SerialReceive( char * Buff, int framelength , int TimeOutResp )
>>      
> the body of this function is essentially all-new so you should be
> consistent on indentation throughout.
>    

I guess I need to go back and look at that ... as I thought that it was.

> [most of body snipped except where I want to comment]
>    
>>      if ( PortIsOpened )
>>      
> .. exits without returning a value if this condition is false
>
>    
>> -tv.tv_usec = newtio.c_cc[VTIME]*100 *1000; //micro-seconds
>>      
> I still think that the old function seemed "pretty right", but my VTIME
> concern is repeated here.  If VTIME was calculated as 0 above, tv will
> be {0,0} indicating that select should return immediately.  Using
> TimeOutResp * 1000 here would make more sense.
>
>    
>> +            while ((select_ret = select(fd+1,&myset, NULL, NULL,&tv)) == -1)
>> +            {       
>> +                    if (errno == EINTR)
>>      
> If any other error occurs (such as EBADF or EINVAL, one of which you can
> readily make occur by using a USB serial interface and unplugging it
> while the program is running) you'll loop forever here (and again
> below).
>    

OK, I had no idea about these other errors...  I lifted this out of the 
GS2 modbus code ... and they only included this EINTR error, but if 
these other errors codes are also relevant they should obviously be 
included.
Onboard RS232 is getting more rare each day.  So handling these other 
errors is important.  I can look at those and include them as well.


>    
>> +      while (select_ret)    // select_ret is>1 or true if select() has 
>> found chars are in buffer
>>      
> comment is wrong.  the loop will execute as long as select_ret != 0.
> Since you handle the possible negative results, other accurate comments
> would say>0 or>=1.  On the other hand, if you really think it's
> important to call attention to the positive nature of select_ret(?) then
> put it in the code and ditch the comment:
>          while(select_ret>0)
>
>    
 >>   while(select_ret>0)

I like this as it is more obvious..


>> +                    if (read_ret == -1)
>> +                            {     // read failed
>> +                                    if ( ModbusDebugLevel>=2 )
>> +                                             printf("Read Port failure\n"); 
>> +                              return 0;
>> +                            }
>>      
> please use good indentation practice in new code.
>
>    
OK..

>> -    int ResponseSize;
>> +    int ResponseSize=0;
>>      
> this is good because otherwise ResponseSize can be returned
> uninitialized when select returns<= 0.  On the other hand, it's
> unrelated to the main thrust of this patch (making serial work) so
> prefer to submit as a separate patch.
>
>    

Is it a good idea to leave the possibility of uninitialized vars even 
though  it is not immediately connected with the serial problem..

I understand your viewpoint.  If the patch is hard to understand, it 
makes review of it more difficult, but from my standpoint, if I have a 
separate patch for typos, initialization, and the actual fix..  how do I 
manage that  with the source code?


>> @@ -308,80 +311,111 @@ void SocketModbusMasterLoop( void )
>>      
> I had trouble making out just what you did in this function, because the
> diff is like an omelet, but it looks like the old code had a structure
>      if(socket) A1
>      else /* serial */ A2
>      B
>      if(socket) C1
>      else /* serial */ C2
>      D
> with common code interleved, and you changed it to
>      if(socket) {
>          A1
>          B
>          C1
>          D
>      } else /* serial */ {
>          A2
>          B
>          C2
>          D
>      }
> and then went on to modify B and D in one or both sides of the 'if'.
> That's fine, and it's not entirely your fault that the diff comes out so
> hard to read.
>    

Yes, sorry this is one of those things that logically might have been 
correct in the first place, but I felt that Modbus TCP  had to be 
separated from  Modbus RTU in order to facilitate debugging.

>    
>> +                                    ResponseSize = SerialReceive( 
>> ResponseFrame, 1/*adr*/+GetModbusResponseLenghtToReceive()+2/*crc*/, 
>> ModbusTimeOutReceipt );
>> +                                    NbrFrames++;
>> +
>>                                  if ( ResponseSize==0 )
>>                                      printf("ERROR CLASSICLADDER-  MODBUS NO 
>> RESPONSE (Errs=%d/%d) !?\n", ++CptErrors, NbrFrames);
>>      
> A ResponseSize different than expected--not just a zero-byte response--
> should be an error.  It's unlikely that a short transmission would pass
> the CRC test in TreatModbusMasterResponse but it will happen at a rate
> of about 1 in 2^16 random short packets.
>
> [I see now that this code is unchanged compared to the original, so
> don't sweat it.. on the other hand, something to fix for the future...]
>
>    

Yes, but there is something not right here as you point out.   They must 
not have been checking for the correct number of chars in the first 
place since the number of chars that were expected to be received were 
wrong in the original code.
This code has errors piled on errors and then to get around some of the 
errors they ignored some of the code that was written  - like the 
calculated length of the response!

>> +                    else  // no question to ask - pause for 300 ms
>> +                            DoPauseMilliSecs( 50 );
>>      
> Again, comments are worse than useless when they lie about what the code
> does.
>
>    

Agreed ...

>> -    }
>> -#ifndef __WIN32__
>> -    pthread_exit(NULL);
>> -#endif
>>      
> You almost certainly didn't intend to remove this code.
>    
Actually I did at the time.  At the time when I cut that out, I didn't 
see any reason to keep any Win32 related code in there.  Unfortunately 
after that I realized that the the preprocessor directives for Win32 are 
scattered throughout the code and IMO clutter it up.    The most recent 
version of Classic Ladder has even more preprocessor directives for some 
other realtime systems, which worsens the situation.
> Jeff
>
>    

Where do we go from here?    I think I should check out the Modbus Force 
coil functions again as now that simply doesn't look right.    I believe 
that I have an Automation Direct PLC here also that I could plug into... 
if I can find all of the modules .. :-)   I have the same Automation 
Direct PLC setup on my lathe and I need to incorporate these Modbus 
changes on that machine to get rid of comm errors.

Dave

------------------------------------------------------------------------------
ThinkGeek and WIRED's GeekDad team up for the Ultimate 
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the 
lucky parental unit.  See the prize list and enter to win: 
http://p.sf.net/sfu/thinkgeek-promo
_______________________________________________
Emc-developers mailing list
Emc-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/emc-developers

Reply via email to