[jira] [Commented] (CONNECTORS-1579) Error when crawling a MSSQL table

Karl Wright (JIRA) Fri, 08 Feb 2019 00:11:39 -0800


    [ 
https://issues.apache.org/jira/browse/CONNECTORS-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16763395#comment-16763395
 ]


Karl Wright commented on CONNECTORS-1579:
-----------------------------------------

You can either check out the entire current trunk source code and build that, 
or download the release source and libs, apply the patch, and build that.  
Which do you want to do?


> Error when crawling a MSSQL table
> ---------------------------------
>
>                 Key: CONNECTORS-1579
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1579
>             Project: ManifoldCF
>          Issue Type: Bug
>          Components: JDBC connector
>    Affects Versions: ManifoldCF 2.12
>            Reporter: Donald Van den Driessche
>            Assignee: Karl Wright
>            Priority: Major
>             Fix For: ManifoldCF 2.13
>
>         Attachments: 636_bb2.csv, CONNECTORS-1579.patch
>
>
> When I'm crawling a MSSQL table through the JDBC connector I get following 
> error on multiple lines:
>  
> {noformat}
> FATAL 2019-02-05T13:21:58,929 (Worker thread '40') - Error tossed: Multiple 
> document primary component dispositions not allowed: document '636'
> java.lang.IllegalStateException: Multiple document primary component 
> dispositions not allowed: document '636'
> at 
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.checkMultipleDispositions(WorkerThread.java:2125)
>  ~[mcf-pull-agent.jar:?]
> at 
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.noDocument(WorkerThread.java:1624)
>  ~[mcf-pull-agent.jar:?]
> at 
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.noDocument(WorkerThread.java:1605)
>  ~[mcf-pull-agent.jar:?]
> at 
> org.apache.manifoldcf.crawler.connectors.jdbc.JDBCConnector.processDocuments(JDBCConnector.java:944)
>  ~[?:?]
> at 
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) 
> [mcf-pull-agent.jar:?]{noformat}
> I looked this error up on the internet and it said that it might have 
> something to do with using the same key for different lines.
>  I checked, but I couldn't find any duplicates that match any of the selected 
> fields in the JDBC.
> Hereby my queries:
>  Seeding query
> {code:java}
> SELECT pk1 as $(IDCOLUMN)
> FROM dbo.bb2
> WHERE search_url IS NOT NULL
> AND mimetype IS NOT NULL AND mimetype NOT IN ('unknown/unknown', 
> 'application/xml', 'application/zip');
> {code}
> Version check query: none
>  Access token query: none
>  Data query: 
>  
>  
> {code:java}
> SELECT 
> pk1 AS $(IDCOLUMN), 
> search_url AS $(URLCOLUMN), 
> ISNULL(content, '') AS $(DATACOLUMN),
> doc_id, 
> search_url AS url, 
> ISNULL(title, '') as title, 
> ISNULL(groups,'') as groups, 
> ISNULL(type,'') as document_type, 
> ISNULL(users, '') as users
> FROM dbo.bb2
> WHERE pk1 IN $(IDLIST);
> {code}
> The hereby added csv is the corresponding line from the table.
> [^636_bb2.csv]
>  
> Due to this problem, the whole crawling pipeline is being held up. It keeps 
> on retrying this line.
> Could you help me understand this error?
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1579) Error when crawling a MSSQL table

Reply via email to