[
https://issues.apache.org/jira/browse/CONNECTORS-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15569211#comment-15569211
]
Konstantin Avdeev commented on CONNECTORS-1325:
-----------------------------------------------
hi Karl,
I think, the issue can be reproduced easily, by putting an emoji (e.g. 😀) into
a field of a task list:
{code}
DEBUG 2016-10-12 18:32:47,521 (Worker thread '72') - SharePoint: getListItems
FileRef value 'sites/test-team/Lists/Main Task List/5_.000', xml response:
'<ns1:listitems xmlns:s="uuid:BDC6E3F0-6DA3-11d1-A2A3-00AA00C14882"
xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882"
xmlns:rs="urn:schemas-microsoft-com:rowset" xmlns:z="#RowsetSchema"
xmlns:ns1="http://schemas.microsoft.com/sharepoint/soap/">
<rs:data ItemCount="1">
<z:row ows_Modified="2016-10-12 17:30:55" ows_Created="2016-10-12 17:30:55"
ows_ID="5" ows_GUID="{E583E8D8-52A7-4CD8-8A5F-6354D57D1E40}" ows_MetaInfo="5;#"
ows__ModerationStatus="0" ows__Level="1" ows_Title="Task emoji >>>😀<<<"
ows_UniqueId="5;#{8F6DF977-9814-4AA0-B7AE-E29838C508CF}"
ows_owshiddenversion="1" ows_FSObjType="5;#0" ows_PermMask="0x7fffffffffffffff"
ows_FileRef="5;#sites/test-team/Lists/Main Task List/5_.000"/>
</rs:data>
</ns1:listitems>'
DEBUG 2016-10-12 18:32:47,522 (Worker thread '72') - SharePoint: Can't get
version of '/Main Task List///5_.000' because of bad XML characters(?)
{code}
Thanks!
> Invalid XML character causing job to abort
> ------------------------------------------
>
> Key: CONNECTORS-1325
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1325
> Project: ManifoldCF
> Issue Type: Bug
> Components: SharePoint connector
> Affects Versions: ManifoldCF 2.3
> Reporter: Phil
> Assignee: Karl Wright
> Priority: Blocker
> Fix For: ManifoldCF 2.5
>
> Attachments: CONNECTORS-1325-2.patch, CONNECTORS-1325-3.patch,
> CONNECTORS-1325.patch
>
>
> The following error is causing the Manifold job to abort, and subsequently
> the job not being able to finish.
> It would be good to have the crawler log this error, but not throw an
> exception which causes the entire job to stop.
> {code}
> ERROR 2016-06-21 19:01:54,562 (Worker thread '6') system.WorkerThread -
> Exception tossed: XML parsing error: Character reference "�" is an
> invalid XML character.
> org.apache.manifoldcf.core.interfaces.ManifoldCFException: XML parsing error:
> Character reference "�" is an invalid XML character.
> at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:390)
> at org.apache.manifoldcf.core.common.XMLDoc.<init>(XMLDoc.java:286)
> at
> org.apache.manifoldcf.crawler.connectors.sharepoint.SPSProxyHelper.getFieldValues(SPSProxyHelper.java:2039)
> at
> org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.processDocuments(SharePointRepository.java:974)
> at
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
> Caused by: org.xml.sax.SAXParseException; lineNumber: 18; columnNumber: 64;
> Character reference "�" is an invalid XML character.
> at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
> at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
> at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121)
> at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:359)
> ... 4 more
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)