[ 
https://issues.apache.org/jira/browse/CONNECTORS-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15356267#comment-15356267
 ] 

Phil edited comment on CONNECTORS-1325 at 6/30/16 12:43 AM:
------------------------------------------------------------

Hi [~daddywri],

I'm finding after installing the patch that it does ignore the error. However, 
the crawler is continuing to attempt to process this document (or at least the 
metadata), resulting in the crawler never finishing. Its currently been running 
for a few days.

I tailed the logs for a particular document using the following:
{{tail -f manifoldcf.log | grep "<DOCUMENT_URL>"}}

Which resulted in the following lines being repeated:
{code}
DEBUG 2016-06-30 09:59:32,928 (Worker thread '13') 
sharepoint.SharePointRepository - SharePoint: Finding metadata to include for 
document/item <DOCUMENT_URL>
DEBUG 2016-06-30 09:59:32,946 (Worker thread '13') sharepoint.SPSProxyHelper - 
SharePoint: In getFieldValues; fieldNames= ....
DEBUG 2016-06-30 09:59:33,100 (Worker thread '27') 
sharepoint.SharePointRepository - SharePoint: Getting version of <DOCUMENT_URL>
DEBUG 2016-06-30 09:59:33,100 (Worker thread '27') 
sharepoint.SharePointRepository - SharePoint: Checking whether to include list 
item ....

.....
....
{code}

I've omitted some repository specific details, but let me know if you want any 
further details.

Any idea why this might be happening?

Thanks


was (Author: priethmuller):
Hi [~daddywri],

I'm finding after installing the patch that it does ignore the error. However, 
the crawler is continuing to attempt to process this document (or at least hte 
metadata), resulting in the crawler never finishing. Its currently being 
running for a few days.

I tailed the logs for a particular document using the following:
{{tail -f manifoldcf.log | grep "<DOCUMENT_URL>"}}

Which resulted in the following lines being repeated:
{code}
DEBUG 2016-06-30 09:59:32,928 (Worker thread '13') 
sharepoint.SharePointRepository - SharePoint: Finding metadata to include for 
document/item <DOCUMENT_URL>
DEBUG 2016-06-30 09:59:32,946 (Worker thread '13') sharepoint.SPSProxyHelper - 
SharePoint: In getFieldValues; fieldNames= ....
DEBUG 2016-06-30 09:59:33,100 (Worker thread '27') 
sharepoint.SharePointRepository - SharePoint: Getting version of <DOCUMENT_URL>
DEBUG 2016-06-30 09:59:33,100 (Worker thread '27') 
sharepoint.SharePointRepository - SharePoint: Checking whether to include list 
item ....

.....
....
{code}

I've omitted some repository specific details, but let me know if you want any 
further details.

Any idea why this might be happening?

Thanks

> Invalid XML character causing job to abort
> ------------------------------------------
>
>                 Key: CONNECTORS-1325
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1325
>             Project: ManifoldCF
>          Issue Type: Bug
>          Components: SharePoint connector
>    Affects Versions: ManifoldCF 2.3
>            Reporter: Phil
>            Assignee: Karl Wright
>            Priority: Blocker
>             Fix For: ManifoldCF 2.5
>
>         Attachments: CONNECTORS-1325.patch
>
>
> The following error is causing the Manifold job to abort, and subsequently 
> the job not being able to finish.
> It would be good to have the crawler log this error, but not throw an 
> exception which causes the entire job to stop.
> {code}
> ERROR 2016-06-21 19:01:54,562 (Worker thread '6') system.WorkerThread - 
> Exception tossed: XML parsing error: Character reference "&#xD83D" is an 
> invalid XML character.
> org.apache.manifoldcf.core.interfaces.ManifoldCFException: XML parsing error: 
> Character reference "&#xD83D" is an invalid XML character.
>         at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:390)
>         at org.apache.manifoldcf.core.common.XMLDoc.<init>(XMLDoc.java:286)
>         at 
> org.apache.manifoldcf.crawler.connectors.sharepoint.SPSProxyHelper.getFieldValues(SPSProxyHelper.java:2039)
>         at 
> org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.processDocuments(SharePointRepository.java:974)
>         at 
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
> Caused by: org.xml.sax.SAXParseException; lineNumber: 18; columnNumber: 64; 
> Character reference "&#xD83D" is an invalid XML character.
>         at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
>         at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
>         at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121)
>         at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:359)
>         ... 4 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to