Jean Charles Passard wrote:
I'm truying to go deeper in Xml analysing but I'm really annoying by
what I read in specification w3c.
Especially about your point 7.
I have noted this delimiters :
1. < >
2. <!-- -->
3. <? ?>
4. <![CDATA[ ]]>
5. <!DOCTYPE >
6. <! >
They all give problems if I try to parse only on <> :
1. it's ok ;)
2. can have < and/or > inside
3. it's ok too.
4. can have < and/or > inside
5. can have <!-- --> <! > and []
6. it's ok
I can't see what idea can make a good parse whitout doing it char by
char.
Of course you have to parse the input character by character. The way
to do this is with a state machine. When you get a '<' character you go
into an intermediate state. You then read the next character to decide
what state to go into next.
I wrote a program once to count lines of C/C++ code that is not unlike
this problem. When you take the issue of comments in C/C++ as well as
directives and tokens, its quite similar to the XML problem. I am
attaching the code as an example.
You can build the program with a simple: gcc -o count count-methods.c
To test, use: ./count -m count-meth*{c,h}.
The code is reasonably well commented. :)
I also wrote a more sophisticated program to count the number of comment
words, variables, etc and the frequency of use, but I can't find it
right now.
-- Bruce
------------------------------------------------------------------------
/******************************************************
* FILENAME: count-methods.c
* File for Program 3A from A Discipline for
* Software Engineering
*
* Author: B. Dubbs
* Start Date: 18 May 1998 (coding)
*
* EXPORTED FUNCTIONS: None
*
* Notes:
* Using PSP0.1
Write a program to count the logical lines of code (LOC), by
methods/functions, in a set of programs.
Note: A lot of design/code can be reused from Program 2A,
Count LOC in program source files.
Main program design:
1. Declare and initialize global statistics
Statistics include:
Total LOC
Total Blank Lines
Total Comment Lines
Total LOC with comments
Total Functions
2. Input
Get the list of programs to count lines of code. This
input will be from the command line. The program will not
handle wildcards.
If an error is encountered (no files specified, output an
appropriate error message.
3. Loop for each file input, count lines of code, by function
and output results.
a. Open the file. If the file cannot be opened, output an error
message and restart the loop.
b. Initialize statistic counters. The states the system can be
in are: inCode, inCommentC, inCommentCpp. The initial state
is inCode. The initial previous character is NUL.
To determine when a function starts, we keep the most recent
alphanumeric word. If the word is immediately followed by a '('
(after optional whitespace) and the line did not start with a
'#' (a macro), the function has started. We must also check to
see if a '{' occurs after he function declaration. If not,
the function is declared, but not implemented.
A function level is incremented when a '{' is found and
decremented when a '}' is found. When the level is decremented
to zero, the function ends.
Initialize functionLevel and nameString.
c. For each character in the file:
when asterisk
if previous character is a '/' and mode != inCommentCpp then
set mode = inCommentC
when slash
if previous character is a '/' and mode == inCode then
set mode = inCommentCpp
else if previous character is an '*' and mode == inCommentC
set mode = inCode
when newline of end-of-file
update file counts (line-has-comment, line-has-code)
if function state i sstart, add line to pending count
if print flag set or function-state is MIDDLE,
update function counts
reset code, comment, print, and macro flags
when '{'
if inCode
if function-state = START
set function-state = MIDDLE
add pending lines of code
if function-state = MIDDLE, increment nesting
when '}'
if inCode and function-state == MIDDLE,
decrement nesting
if nesting == 0
increment function count
set print flag
set function-state = NONE
when '('
if inCode and not inMacro and function-state == NONE
set function-state = START
copy current-name to function-name
clear function counts
when ';'
if function-state == START and inCode
set function-state = NONE
when alphanumeric or '_'
if inCode
if function-state == NONE
if previous character is not alphanumeric
clear current name
append character to current name
set line-has-code flag
set line-has-comment flag
when other
do nothing
save current character as previous character
d. Close the input file.
e. Print the statistics for the file.
f. Accumulate global statistics.
4. Print the total statistics for all files.
5. Exit
***************
Supporting Functions
Clear file counts
Set all static file counters to zero
Clear function counts
Set all static function counters to zero
Clear Total counts
Set totals to zero
Print Function Counts
Print file Counts
Print statistics for above
Increment Pending LOC
Add Pending LOC to Function totals
Update file counts (line-has-comment, line-has-code)
Update function counts (line-has-comment, line-has-code)
(These are essentially the same but work but work on file and
function counts respectively -- only one function updates
the previous-line-blank flag)
if line-has-comment and line-has-code
increment code-with-comment counter
increment code counter
clear previous-line-blank flag
else if line-has-comment
increment comment counter
clear previous-line-blank flag
else if line-has-code
increment-code-counter
clear previous-line-blank flag
else
if previous-line was not blank
increment blank-line counter
set previous-line-blank flag
endif
**********************************************************************/
#include <unistd.h>
#include <stdio.h>
#include <ctype.h>
#include <string.h>
#include "count-methods.h"
char* argv0;
int main(int argc, char* argv[])
{
unsigned int i;
int methods = 0;
char filename[256];
int ch;
extern char* optarg;
int skip = 0;
/**********************************************************
* 1. Initialize global statistics
*/
ClearTotalCounts();
*filename = 0;
argv0 = argv[0];
/**********************************************************
* 2. Input
*
* Get the list of programs to count lines of code by function.
* This input will be from the command line. The program will not
* handle wildcards.
*
* If an error is encountered (no files specified), output an
* appropriate error message.
*/
if (argc < 2) usage();
/* 2a. Get arguments */
while ((ch = getopt(argc, argv, "f:m")) != -1)
{
switch (ch)
{
case 'm':
methods = 1;
skip++;
break;
case 'f':
strcpy(filename, optarg); // Note: no range check
skip +=2;
break;
default:
usage();
break;
}
}
//printf("argc=%i, methods=%i, filename=%s \n", argc, methods, filename);
//return;
/**********************************************************
* 3. Loop for each file input, count the lines of code, by function
* and output results.
*
* a. Open the file. If the file cannot be opened, output an error
* message and restart the loop.
*
* b. Initialize statistic counters. The state the system can be
* in are: InCode, inCommentC, inCommentCpp. The initail state
* is inCode. The initial previous character is NULL.
*/
if ( filename != NULL )
{
FILE* list;
FILE* currentFile;
char file[256];
list = fopen(filename, "r");
if ( list == NULL )
{
fprintf(stderr, "Could not open file list %s.\n", filename);
}
else
{
while ( fgets(file, 255, list) != NULL )
{
if (file[strlen(file)-1] == '\n') file[strlen(file)-1] = '\0';
if ( strlen(file) == 0 ) continue;
currentFile = fopen(file, "r");
if (currentFile == NULL)
{
fprintf(stderr, "Could not open file %s.\n", file);
continue;
}
ProcessFile(currentFile, methods, file);
}
}
}
for (i=1+skip; i<argc; i++)
{
FILE* currentFile;
currentFile = fopen(argv[i], "r");
if (currentFile == NULL)
{
fprintf(stderr, "Could not open file %s.\n", argv[1]);
continue;
}
ProcessFile(currentFile, methods, argv[i]);
}
PrintTotalCounts();
return 0;
}
void ProcessFile(FILE* currentFile, int methods, char* filename)
{
int previousChar;
int lineHasCode;
int lineHasComment;
int inMacro;
int printFlag;
int nesting;
char functionName[256];
char currentName[256];
enum CODE_STATUS mode;
enum FUNCTION_STATUS functionState;
ClearFileCounts();
previousChar = 0;
nesting = 0;
lineHasCode = FALSE;
lineHasComment = FALSE;
inMacro = FALSE;
printFlag = FALSE;
mode = IN_CODE;
functionState = NONE;
currentName[0] = '\0';
/**********************************************************
* 3. For each character in the file:
* when asterisk
* if previous character is a '/' and mode != inCommentCpp then
* set mode = inCommentC
*
* when slash
* if previous character is a'/' and mode == inCode then
* set mode = inCommentCpp
* else if previous character is an '*' and mode == inCommentC
* set mode = inCode
*/
while (TRUE)
{
int currentChar;
currentChar = fgetc(currentFile);
#ifdef DEBUG
if (currentChar == EOF) printf("\nEOF\n");
else putchar(currentChar);
#endif
switch (currentChar)
{
case '*':
if (previousChar == '/' && mode != IN_COMMENT_CPP)
{
mode = IN_COMMENT_C;
}
break;
case '/':
if (previousChar == '/' && mode == IN_CODE)
{
mode = IN_COMMENT_CPP;
}
else if (previousChar == '*' && mode == IN_COMMENT_C)
{
mode = IN_CODE;
}
break;
/* when newline of end-of-file
* if code and function only started, count line as pending
* update counts (line-has-comment, line-has-code)
* need to take care of last line of function when not MIDDLE
* so we use the print flag
* if print flag is set, print function counts
* reset code, comment, print, and macro flags
*/
case EOF:
case '\n':
if (previousChar=='\n' && currentChar == EOF)
{
break;
}
if (lineHasCode && functionState == START)
{
IncrementPending();
}
if (functionState == MIDDLE || printFlag)
{
UpdateFunctionCounts(lineHasComment, lineHasCode);
}
UpdateFileCounts(lineHasComment, lineHasCode);
if (printFlag && methods == 1)
{
PrintFunctionCounts(functionName);
}
if (mode == IN_COMMENT_CPP)
{
mode = IN_CODE;
}
#ifdef DEBUG
printf("lineHasCode = %i ", lineHasCode);
printf("lineHasComment = %i ", lineHasComment);
printf("printFlag = %i ", printFlag);
printf("mode=%i ", mode);
printf("fctnState=%i ", functionState);
printf("inMacro=%i ", inMacro);
printf("nesting=%i\n", nesting);
#endif
lineHasCode = FALSE;
lineHasComment = FALSE;
inMacro = FALSE;
printFlag = FALSE;
break;
/* when '{'
* if inCode
* if function-state == START, set function-state = MIDDLE
* if function-state == MIDDLE, increment nesting
*
* when '}'
* if inCode and function-state == MIDDLE,
* decrement nesting
* if nesting == 0
* set print flag
* set function-state = NONE
* increment function count
*
* when '('
* if inCode and not inMacro and function-state == NONE,
* set function-state = START
* copy current-name to function-name
* clear function counts
*/
case '{':
#ifdef DEBUG
printf("\n{ mode=%i, functionState=%i\n", mode,
functionState);
#endif
if (mode == IN_CODE)
{
if (functionState == START) /* Now we really have a
function */
{
functionState = MIDDLE;
AddPending();
}
if (functionState == MIDDLE)
{
nesting++;
}
}
break;
case '}':
#ifdef DEBUG
printf("\n} mode=%i, functionState=%i\n", mode,
functionState);
#endif
if (mode == IN_CODE && functionState == MIDDLE)
{
nesting--;
if (nesting == 0) /* End of function found */
{
IncrementFunctionCount();
printFlag = TRUE; /* Print at end of line */
functionState = NONE;
}
}
break;
case '(': /* looking for start of function */
#ifdef DEBUG
printf("\n( mode=%i, functionState=%i, inMacro=%i\n",
mode, functionState, inMacro);
#endif
if (mode == IN_CODE && functionState == NONE &&
!inMacro)
{
functionState = START;
strcpy(functionName, currentName);
ClearFunctionCounts();
#ifdef DEBUG
printf("\n( functionName=%s, functionState=%i\n",
functionName, functionState);
#endif
}
break;
/* when ';'
* if function-state == START and inCode
* set function-state = NONE
*
* when '#'
* if inCode and line-has-code is false,
* set inMacroFlag
*/
case ';':
if (functionState == START && mode == IN_CODE)
{
functionState = NONE; /* Only a declaration */
}
break;
case '#':
if (mode == IN_CODE && !lineHasCode)
{
inMacro = TRUE;
}
break;
default:
break;
} /* switch */
if (currentChar == EOF)
{
break;
}
/* when alphanumeric or '_'
* if inCode
* if function-state == NONE
* if previous character is not alphanumeric
* clear current name
* append character to current name
* set line-has-code flag
* else
* set line-has-comment flag
*
* when other
* do nothing
*/
if (isalnum((char)currentChar) || currentChar == '_')
{
if (mode == IN_CODE)
{
lineHasCode = TRUE;
if (functionState == NONE) /* Looking for function name */
{
int length;
if (!isalnum((char)previousChar))
{
currentName[0] = 0;
}
length = strlen(currentName);
currentName[length] = (char)currentChar;
currentName[length+1] = 0; /* Note: no bounds
check!!! */
}
}
else
{
lineHasComment = TRUE;
}
}
/* save current character as previous character
*/
previousChar = currentChar;
} /* While char not EOF */
/***************************************************************
* d. Close the input file
* e. Print the statistics for the file.
* f. Accumulate global statistics
*
* 4. Print the total statistics for all files.
*/
fclose(currentFile);
PrintFileCounts(filename);
UpdateTotalCounts();
}
/*******************************************************************
* Helper functions to manage total counts
*
* Counts are local to file
*
*/
static unsigned int totalCode;
static unsigned int totalComments;
static unsigned int totalBlank;
static unsigned int totalCommentsWithCode;
static unsigned int totalFunctions;
static unsigned int totalFiles;
static unsigned int headerPrinted;
;
static unsigned int fileCodeLines;
static unsigned int fileCommentLines;
static unsigned int fileBlankLines;
static unsigned int fileCommentsWithCode;
static unsigned int previousLineBlank;
void ClearTotalCounts(void)
{
totalCode = 0;
totalComments = 0;
totalBlank = 0;
totalCommentsWithCode = 0;
totalFunctions = 0;
totalFiles = 0;
headerPrinted = FALSE;
}
void UpdateTotalCounts(void)
{
totalCode += fileCodeLines;
totalComments += fileCommentLines;
totalBlank += fileBlankLines;
totalCommentsWithCode += fileCommentsWithCode;
}
void PrintTotalCounts(void)
{
PrintHeader();
printf("\n%13u%12u%14u%19u Total\n",
totalCode, totalBlank, totalComments, totalCommentsWithCode);
printf("\n\nTotal Functions %u\n", totalFunctions);
}
/*************************************************************************
* Helper functions to manage file counts
*/
void IncrementFileCount(void)
{
totalFiles++;
}
void ClearFileCounts(void)
{
fileCodeLines = 0;
fileCommentLines = 0;
fileBlankLines = 0;
fileCommentsWithCode = 0;
previousLineBlank = FALSE;
}
void UpdateFileCounts(int lineHasComment, int lineHasCode)
{
if (lineHasComment && lineHasCode)
{
fileCommentsWithCode++;
fileCodeLines++;
previousLineBlank = FALSE;
}
else if (lineHasComment)
{
fileCommentLines++;
previousLineBlank = FALSE;
}
else if (lineHasCode)
{
fileCodeLines++;
previousLineBlank = FALSE;
}
else
{
if (previousLineBlank == FALSE)
{
fileBlankLines++;
}
previousLineBlank = TRUE;
}
#ifdef DEBUG
printf("fileCodeLines=%i\n", fileCodeLines);
#endif
}
void PrintHeader(void)
{
if (!headerPrinted)
{
printf("Lines_of_code Blank_Lines Comment_Lines "
"Comments_with_Code Func name/Filename\n");
headerPrinted = TRUE;
}
}
void PrintFileCounts(char* filename)
{
PrintHeader();
printf("%13u%12u%14u%19u %s\n",
fileCodeLines, fileBlankLines, fileCommentLines,
fileCommentsWithCode, filename);
}
/*************************************************************************
* Helper functions to manage function counts
*/
static unsigned int functionCodeLines;
static unsigned int functionCommentLines;
static unsigned int functionBlankLines;
static unsigned int functionCommentsWithCode;
static unsigned int functionCodePending;
void IncrementPending(void)
{
functionCodePending++;
}
void AddPending(void)
{
functionCodeLines += functionCodePending;
}
void IncrementFunctionCount(void)
{
totalFunctions++;
}
void ClearFunctionCounts(void)
{
functionCodeLines = 0;
functionCommentLines = 0;
functionBlankLines = 0;
functionCommentsWithCode = 0;
functionCodePending = 0;
}
/* Must be called before UpdateFileCounts to take care of
* previous blank line update.
*/
void UpdateFunctionCounts(int lineHasComment, int lineHasCode)
{
if (lineHasComment && lineHasCode)
{
functionCommentsWithCode++;
functionCodeLines++;
}
else if (lineHasComment)
{
functionCommentLines++;
}
else if (lineHasCode)
{
functionCodeLines++;
}
else
{
if (previousLineBlank == FALSE)
{
functionBlankLines++;
}
}
#ifdef DEBUG
printf("functionCodeLines=%i\n", functionCodeLines);
#endif
}
void PrintFunctionCounts(char* functionName)
{
PrintHeader();
printf("%13u%12u%14u%19u %s\n",
functionCodeLines, functionBlankLines, functionCommentLines,
functionCommentsWithCode, functionName);
}
void usage()
{
(void) fprintf(stderr, "usage: %s [-f file] [-m] [file1 ... ]\n", argv0);
(void) fprintf(stderr, " m : print method/function counts\n");
(void) fprintf(stderr, " f filename: get files to count from filename\n");
(void) fprintf(stderr, " a file or list of files must be specified\n");
exit(1);
}
------------------------------------------------------------------------
/******************************************************
* FILENAME: count-methods.h
* Header file for Program 3A from A Discipline for
* Software Engineering
*
* Author: B. Dubbs
* Start Date: 20 May 1998
*
* EXPORTED FUNCTIONS: None
*
* Notes:
*
* *****************************************************/
#ifndef _count_methods
#define _count_methods
#define TRUE 1
#define FALSE 0
enum CODE_STATUS
{ IN_CODE,
IN_COMMENT_C,
IN_COMMENT_CPP
};
enum FUNCTION_STATUS
{ START,
MIDDLE,
NONE
};
void ClearTotalCounts(void);
void UpdateTotalCounts(void);
void PrintTotalCounts(void);
void ClearFileCounts(void);
void UpdateFileCounts(int lineHasCode, int lineHasComments);
void PrintFileCounts(char* filename);
void ClearFunctionCounts(void);
void UpdateFunctionCounts(int lineHasCode, int lineHasComments);
void PrintFunctionCounts(char* functionName);
void IncrementPending(void);
void AddPending(void);
void IncrementFunctionCount(void);
void IncrementFileCount(void);
void ProcessFile(FILE* currentFile, int methods, char* file);
void PrintHeader(void);
void usage(void);
#endif