I use FedoraGSearch to index and have the same problem for about 20% of PDFs in my collection. Log has the following:
org.xml.sax.SAXParseException: Character reference "" in an invalid XML character The same for "�" I heard from other people that this known Lucene-Solr parsing problem. So far I don't know how to solve. Another problem is that this error prevents indexing even metadata of the objects with "unusual" PDFs. Please let us know if you find the solution. Serhiy On Wed, Apr 14, 2010 at 10:21 AM, tasai Kan <[email protected]> wrote: > Hello. Like the title say, i edited the demoFoxmlToSolr file to index pdf > file (It's external reference content). But It only works on some pdf files, > and doesn't for the others. Does it depend on something? > Thanks in advance. > > ________________________________ > Your E-mail and More On-the-Go. Get Windows Live Hotmail Free. Sign up now. > ------------------------------------------------------------------------------ > Download IntelĀ® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > _______________________________________________ > Fedora-commons-users mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/fedora-commons-users > > ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev _______________________________________________ Fedora-commons-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/fedora-commons-users
