Hi, I need to read XML files from our suppliers, who generally use the ONIX spec when generating their files. These files contain book metadata, and from a high level have a fairly simple make up: 1 header section and 1 or more product sections.
My approach so far as been to use Reader to avoid loading the entire files into memory, and for the most part this works fine. The only place it falls down is when a file contains an entity that isn't & < > or a numeric one. Here's a sample file that uses the – entity: http://gist.github.com/79386 And here's a contrived example that uses Reader to extract the Header and Product records: http://gist.github.com/79387. If you run this, it outputs the following nonfatal error and doesn't return the full text of the Product node: ~/git/onix.git master$ ruby examples/entities.rb <Header> <FromCompany>HarperCollins Publishers</FromCompany> <ToCompany>Australian Booksellers Association</ToCompany> <SentDate>20081106</SentDate> </Header> Error: Entity 'ndash' not defined at examples/../data/entities.xml:28. -- I have 2 questions: - The ONIX DTD has a definition for a range of entities, including – Can I get libxml/Reader to recognise them? - Failing that, can I get reader to just return entities unmodified instead of exiting with an error? I've tried passing various options to the Reader constructor (like XML::Parser::Options::NOENT) to no avail. Cheers -- James Healy <jimmy-at-deefa-dot-com> Sun, 15 Mar 2009 22:18:44 +1100 _______________________________________________ libxml-devel mailing list libxml-devel@rubyforge.org http://rubyforge.org/mailman/listinfo/libxml-devel