I'm truying to go deeper in Xml analysing but I'm really annoying by
what I read in specification w3c.
Especially about your point 7.
I have noted this delimiters :
1. < >
2. <!-- -->
3. <? ?>
4. <![CDATA[ ]]>
5. <!DOCTYPE >
6. <! >
They all give problems if I try to parse only on <> :
1. it's ok ;)
2. can have < and/or > inside
3. it's ok too.
4. can have < and/or > inside
5. can have <!-- --> <! > and []
6. it's ok
I can't see what idea can make a good parse whitout doing it char by
char.
JC
George Makrydakis a écrit :
Nice, some comments:
1. using character to character parsing makes it more time - consuming
while processing
2. i have fixed the current work to parse the DTD elements too
(something that is not done by most small C++ xml parsers, they can
crash with them).
3. if it always expects to find <. or > it can crash or loop forever,
even in valid documents.
4. I do not like loading the entire file in memory too. Infact i want
a sax - like parser, loading it for now is just for testing purposes.
5. Right now I am concentrated on minimizing loop demands.
6. There is the possibility that we end up having a validator part
within the parser too :)
7. Remember that the syntactically important elements within XML are
ONLY the <,> characters, slashes come as second priority AFTER you
have created a formatted XML string.
8. When handling memory aspects in C++, since it offers some "garbage"
collection (of some sort) it is very useful to use
constructor/destructor stuff.
9. I very much like the simplicity in your code; less lines mean less
debugging, and mostly that things are done right.
You will have my version ready for prime time soon.
More to come soon...
Thank you all
George Makrydakis
gmak
Jean Charles Passard wrote:
Here is a try on the same idea (<> and ><)
But I have prefered not to load the complete file in memory.
It's my really first try in c++, then I have surely not use really
well the objects.
off course, I suppose there is no syntax errors in xml, then I do not
make controls.
JC Passard
---------------------------------------------------------------------------
#include <iostream>
#include <istream>
#include <fstream>
using namespace std;
int decode_stream (istream& is, string& cdata) {
char c;
static int is_open = 0;
is.get(c);
while (is.good()) {
if (c == '<') {
if (!is_open) break;
is_open ++;
}
if (c == '>') {
is_open --;
if (!is_open) break;
}
cdata += c;
is.get(c);
}
if (!is.good()) return 0;
if (c == '<') {
is_open ++;
return 1;
}
return 2;
}
int ismisc (string& cdata) {
if (!isalpha (cdata[0]) && cdata[0] != '/') return 1;
return 0;
}
int istag (string &cdata) {
if (isalpha (cdata[0]) || cdata[0] == '/') return 1;
return 0;
}
int analyze_outside (string& cdata) {
cout << "Outside : " << cdata << endl;
return 0;
}
int analyze_inside (string& cdata) {
if (ismisc (cdata)) {
cout << "Misc Data : " << endl;
cout << cdata << endl << endl;
return 0;
}
if (istag (cdata)) {
cout << "Tag data : " << endl;
cout << cdata << endl << endl;
return 0;
}
}
int analyze_stream (istream& is) {
string cdata;
int find_it;
while (find_it = decode_stream (is, cdata)) {
if (find_it == 1) analyze_outside (cdata); // find_it <
if (find_it == 2) analyze_inside (cdata); // find_it >
cdata.clear();
}
return 0;
}
int main () {
fstream file;
file.open ("test.xml");
analyze_stream (file);
file.close ();
return 0;
}
---------------------------------------------------------------------------
George Makrydakis a écrit :
No misunderstandings please... This is what I was working on:
The only bug to fix has to do with DTD (minor one but it crashes
it...)
Working together means that I must do marathon running?
Geez..., do not mix premature constructive criticism with the
need to not be
releasing buggy stuff..
The code below works if you take out DTD elements out of any xml
file that is VALID.
Handles the <,> and >,< pairs correctly no matter how weird the
syntax is...
IT IS BUGGY BUT IT IS UNINFORMED, and most of all *SMALL*
Thank you for making my trouble worth nothing, you could not
wait a couple of days more, could you...
----------------------------------------CUT---------------------------------------------
#include <fstream>
#include <iostream>
#include <string>
#include <vector>
#include <cstdio>
using namespace std;
int main ()
{
string linebuffer;
int lnct = 0;
vector<string> myvector;
vector<string> processing;
string testing;
string grabITEM;
myvector.clear();
ifstream myfile("coreutils.xml"); // take out DTD stuff please...
if ( myfile.is_open() )
{
while (getline(myfile,linebuffer,'\n'))
{
myvector.push_back(linebuffer);
}
myfile.close();
linebuffer.clear();
}
else
{
cout << "file not found!" << endl;
}
for (lnct = 0; lnct < myvector.size(); lnct++)
{
testing = myvector.at(lnct);
while ( !testing.empty() )
{
if (!linebuffer.empty()) { testing = linebuffer + "
" + testing; }
linebuffer.clear();
int stopTAG = testing.find_first_of(">");
int openTAG = testing.find_first_of("<");
if ( ( openTAG == string::npos ) || ( stopTAG ==
string::npos ) )
{
if (( openTAG == string::npos ) && ( stopTAG ==
string::npos ))
{
cout << testing << endl;
testing.clear();
break;
}
else if (( openTAG != string::npos ) && (
stopTAG == string::npos ))
{
linebuffer = testing.substr(openTAG);
cout << testing.substr(0, openTAG) << endl;
testing.clear();
break;
}
}
cout << testing.substr(0, openTAG) << endl;
grabITEM = testing.substr(openTAG, stopTAG + 1 -
openTAG);
cout << grabITEM << endl;
testing = testing.substr(stopTAG + 1);
}
}
myvector.clear();
return 0;
}
------------------------------------- CUT
---------------------------------------------------
--
http://linuxfromscratch.org/mailman/listinfo/alfs-discuss
FAQ: http://www.linuxfromscratch.org/faq/
Unsubscribe: See the above information page