Donald Van den Driessche created CONNECTORS-1579:
----------------------------------------------------
Summary: Error when crawling a MSSQL table
Key: CONNECTORS-1579
URL: https://issues.apache.org/jira/browse/CONNECTORS-1579
Project: ManifoldCF
Issue Type: Bug
Components: JDBC connector
Affects Versions: ManifoldCF 2.12
Reporter: Donald Van den Driessche
Attachments: 636_bb2.csv
When I'm crawling a MSSQL table through the JDBC connector I get following
error on multiple lines:
{noformat}
FATAL 2019-02-05T13:21:58,929 (Worker thread '40') - Error tossed: Multiple
document primary component dispositions not allowed: document '636'
java.lang.IllegalStateException: Multiple document primary component
dispositions not allowed: document '636'
at
org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.checkMultipleDispositions(WorkerThread.java:2125)
~[mcf-pull-agent.jar:?]
at
org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.noDocument(WorkerThread.java:1624)
~[mcf-pull-agent.jar:?]
at
org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.noDocument(WorkerThread.java:1605)
~[mcf-pull-agent.jar:?]
at
org.apache.manifoldcf.crawler.connectors.jdbc.JDBCConnector.processDocuments(JDBCConnector.java:944)
~[?:?]
at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
[mcf-pull-agent.jar:?]{noformat}
I looked this error up on the internet and it said that it might have something
to do with using the same key for different lines.
I checked, but I couldn't find any duplicates that match any of the selected
fields in the JDBC.
Hereby my queries:
Seeding query
{code:java}
SELECT pk1 as $(IDCOLUMN)
FROM dbo.bb2
WHERE search_url IS NOT NULL
AND mimetype IS NOT NULL AND mimetype NOT IN ('unknown/unknown',
'application/xml', 'application/zip');
{code}
Version check query: none
Access token query: none
Data query:
{code:java}
SELECT
pk1 AS $(IDCOLUMN),
search_url AS $(URLCOLUMN),
ISNULL(content, '') AS $(DATACOLUMN),
doc_id,
search_url AS url,
ISNULL(title, '') as title,
ISNULL(groups,'') as groups,
ISNULL(type,'') as document_type,
ISNULL(users, '') as users
FROM dbo.bb2
WHERE pk1 IN $(IDLIST);
{code}
The hereby added csv is the corresponding line from the table.
[^636_bb2.csv]
Could you help me understand this error?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)