cpplove wrote:
Hello everyone,
Im trying to parse wikipedia XML with SAX2 parser of xerces-c 2_7_0.
When i set the parser to parse from MemBufInputSource I get error like
All the files in the collection come up with the same error.
Fatal Error at file R:\000\41000.xml, line 2, char 2
Message: Expected comment or CDATA
This error occurs when the parser encounters the character sequence '<'
'!' which can be either the start of a comment or a CDATA section, but
it finds neither of those constructs. Are you sure the data stream
you're providing is correct, and the length correct? It might be
worthwhile for you to dump the buffer before you parse it, just to see
what the parser is going to get.
But when I try
parser->parse(filename) it does not give me any errors.
my prolem is im having to extend code that already reads the file in a
buffer.
so i want to use memort buffer so i can avoid disk IO again.
it looks like the following
filename = disk.get_first_filename(argv[param]);
file = (unsigned char *)disk.read_entire_file();
//
// read_entire_file() looks like the following
//if ((fp = fopen(filename, "rb")) == NULL) return NULL;
//if (fstat(fileno(fp), &details) == 0)
// if ((*file_length = details.st_size) != 0)
// if ((block = new (std::nothrow) char [(long)(details.st_size +
1)]) !=
NULL)
// +1 for the '\0' on the end
// if (fread(block, (long)details.st_size, 1, fp) == 1)
// block[details.st_size] = '\0';
//
============================================================================
unsigned int zzz = strlen( (const char*)file); // verified this
Can you give more details on what you mean by "verified this?"
MemBufInputSource* memIS = new MemBufInputSource (
(const XMLByte*) file ,
zzz ,
filename ,
false );
// memIS->setEncoding(XMLUni::fgUTF8EncodingString); // tried turning this
on
// the file being load is
UTF-8
// saved as UTF-8 encoding
This shouldn't be necessary if the stream contains a well-formed document.
Dave