I use FedoraGSearch to index and have the same problem for about 20%
of PDFs in my collection. Log has the following:

org.xml.sax.SAXParseException: Character reference "&#24" in an
invalid XML character

The same for "&#0"

I heard from other people that this known Lucene-Solr parsing problem.
So far I don't know how to solve. Another problem is that this error
prevents indexing even metadata of the objects with "unusual" PDFs.
Please let us know if you find the solution.

Serhiy



On Wed, Apr 14, 2010 at 10:21 AM, tasai Kan <[email protected]> wrote:
> Hello. Like the title say, i edited the demoFoxmlToSolr file to index pdf
> file (It's external reference content). But It only works on some pdf files,
> and doesn't for the others. Does it depend on something?
> Thanks in advance.
>
> ________________________________
> Your E-mail and More On-the-Go. Get Windows Live Hotmail Free. Sign up now.
> ------------------------------------------------------------------------------
> Download IntelĀ® Parallel Studio Eval
> Try the new software tools for yourself. Speed compiling, find bugs
> proactively, and fine-tune applications for parallel performance.
> See why Intel Parallel Studio got high marks during beta.
> http://p.sf.net/sfu/intel-sw-dev
> _______________________________________________
> Fedora-commons-users mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/fedora-commons-users
>
>

------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Fedora-commons-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/fedora-commons-users

Reply via email to