I've run and delete the job in question on ManifloldCF 2.4 and current trunk (2.5-dev with CONNECTORS-1323.patch.) Our problem can be reproduced with 2.4 and seems to be resolved with trunk version.
Operation: 1. Create a job with eight outputs below: - ds_solr_forum_en-eu - ds_solr_forum_en-in - ds_solr_forum_en-sg - ds_solr_forum_en-us - ds_solr_forum_ko-kr_en - ds_solr_forum_zh-cn_en - ds_solr_forum_zh-tw_en - ds_solr_forum_pt-br_en 2. Run the job for a while. 3. Abort the job. 4. Delete the job. With ManifoldCF 2.4, SQLException and stack traces (below) was logged and the job remained in "clean up" status. ERROR 2016-06-14 09:33:19,714 (Document delete thread '0') - Document delete thread aborting and restarting due to database connection reset: Database exception: SQLException doing query (22001): ERROR: value too long for type character varying(64) org.apache.manifoldcf.core.interfaces.ManifoldCFException: Database exception: SQLException doing query (22001): ERROR: value too long for type character varying(64) at org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.finishUp(Database.java:715) at org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:741) at org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:803) at org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1457) at org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:146) at org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204) at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performModification(DBInterfacePostgreSQL.java:661) at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performInsert(DBInterfacePostgreSQL.java:187) at org.apache.manifoldcf.core.database.BaseTable.performInsert(BaseTable.java:68) at org.apache.manifoldcf.crawler.repository.RepositoryHistoryManager.addRow(RepositoryHistoryManager.java:203) at org.apache.manifoldcf.crawler.repository.RepositoryConnectionManager.recordHistory(RepositoryConnectionManager.java:706) at org.apache.manifoldcf.crawler.system.DocumentDeleteThread$OutputRemoveActivity.recordActivity(DocumentDeleteThread.java:295) at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$OutputRecordingActivity.recordActivity(IncrementalIngester.java:2383) at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$OutputRecordingActivity.recordActivity(IncrementalIngester.java:2383) at org.apache.manifoldcf.agents.output.solr.HttpPoster.deletePost(HttpPoster.java:720) at org.apache.manifoldcf.agents.output.solr.SolrConnector.removeDocument(SolrConnector.java:605) at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.removeDocument(IncrementalIngester.java:2306) at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentDeleteMultiple(IncrementalIngester.java:1042) With 2.5-dev, there were no errors and the job was completely removed. Thank you for the fix. So I want to apply same fix (CONNECTORS-1323) to ManifoldCF 2.2 because our production system cannot be upgraded to the latest version immediately, though we should plan to do so. I'll try it. Best regards, Tomoko 2016-06-13 18:34 GMT+09:00 Tomoko Uchida <[email protected]>: > And some additional information are here. > > I use ManifoldCF 2.2. > >> (1) Which underlying database are you using? > > I use PostgreSQL 9.4.5 > >> (2) Have you modified the MCF schema in any way? > > No. I did not modify any MCF db schema. > >> (3) What are the actual names of the output connections in question? > > For example, a job has 8 outputs below. There are other jobs that > cannot be deleted by same reason. > - ds_solr_forum_en-eu > - ds_solr_forum_en-in > - ds_solr_forum_en-sg > - ds_solr_forum_en-us > - ds_solr_forum_ko-kr_en > - ds_solr_forum_zh-cn_en > - ds_solr_forum_zh-tw_en > - ds_solr_forum_pt-br_en > > For business requirements, I crawl a web site and post to multiple > (eight) solr cores. > > Whole job definition is below (I deleted seeds/includes/excludes URLs > from the original json data): > > { > "job": { > "description": "ds_forum_en", > "document_specification": { > "excludes": “…”, > "excludescontentindex": "", > "excludesindex": "", > "includes": “…”, > "includesindex": ".*", > "limittoseeds": { > "_attribute_value": "true", > "_value_": "" > }, > "seeds": “…” > }, > "expiration_interval": "infinite", > "hopcount_mode": "accurate", > "id": "1464673266530", > "pipelinestage": [ > { > "stage_connectionname": "ds_solr_forum_en-eu", > "stage_id": "0", > "stage_isoutput": "true", > "stage_specification": {} > }, > { > "stage_connectionname": "ds_solr_forum_en-in", > "stage_id": "1", > "stage_isoutput": "true", > "stage_specification": {} > }, > { > "stage_connectionname": "ds_solr_forum_en-sg", > "stage_id": "2", > "stage_isoutput": "true", > "stage_specification": {} > }, > { > "stage_connectionname": "ds_solr_forum_en-us", > "stage_id": "3", > "stage_isoutput": "true", > "stage_specification": {} > }, > { > "stage_connectionname": "ds_solr_forum_ko-kr_en", > "stage_id": "4", > "stage_isoutput": "true", > "stage_specification": {} > }, > { > "stage_connectionname": "ds_solr_forum_zh-cn_en", > "stage_id": "5", > "stage_isoutput": "true", > "stage_specification": {} > }, > { > "stage_connectionname": "ds_solr_forum_zh-tw_en", > "stage_id": "6", > "stage_isoutput": "true", > "stage_specification": {} > }, > { > "stage_connectionname": "ds_solr_forum_pt-br_en", > "stage_id": "7", > "stage_isoutput": "true", > "stage_specification": {} > } > ], > "priority": "5", > "recrawl_interval": "86400000", > "repository_connection": "ds_forum_en", > "reseed_interval": "3600000", > "run_mode": "continuous", > "start_mode": "manual" > } > } > > Thank you, > Tomoko > > 2016-06-13 18:09 GMT+09:00 Tomoko Uchida <[email protected]>: >> Hi Karl, >> >> Thank you for rapid response! I'll try the patch soon. >> >> Regards, >> Tomoko >> >> 2016-06-13 16:20 GMT+09:00 Karl Wright <[email protected]>: >>> Ok, some further exploration yields the following: >>> (1) A check was put into the code a while ago to prevent overly long >>> activity names from blowing things up. That is why we no longer see this >>> problem. >>> (2) There was a problem with activity logging for deletions across multiple >>> output connections. See CONNECTORS-1323. I've provided a patch. >>> >>> Karl >>> >>> >>> On Mon, Jun 13, 2016 at 1:55 AM, Karl Wright <[email protected]> wrote: >>> >>>> Hi Tomoko, >>>> >>>> Sorry, I missed this post when it was originally made. >>>> >>>> The activitytype column is provided by the framework for only a small >>>> number of specific events. In no case does the activitytype contain >>>> anything other than a fixed-length string; it's meant to be queried on. >>>> That string may include the name of a single output connection or of a >>>> transformation connection, but only one. The maximum length of an output >>>> or transformation connection name is 32, so the total length available for >>>> the rest of the activitytype column is 30. >>>> >>>> The string "document deletion" is 17 characters, so that's nowhere near >>>> the limit here. So this makes no sense. >>>> >>>> Can you be more specific about the following: >>>> >>>> (1) Which underlying database are you using? >>>> (2) Have you modified the MCF schema in any way? >>>> (3) What are the actual names of the output connections in question? >>>> >>>> Thanks, >>>> Karl >>>> >>>> >>>> >>>> >>>> On Sun, Jun 12, 2016 at 10:42 PM, Tomoko Uchida < >>>> [email protected]> wrote: >>>> >>>>> Hi, any suggestions? >>>>> >>>>> Is this a known limitation, or >>>>> should I create a ticket about that? >>>>> >>>>> Thanks, >>>>> Tomoko >>>>> >>>>> 2016-06-09 10:44 GMT+09:00 Tomoko Uchida <[email protected]>: >>>>> > Hello developers, >>>>> > >>>>> > I have sent same message to the user mailing list but there are no >>>>> > reply. Could anyone help me? >>>>> > Some jobs in our customer production environment no longer cannot be >>>>> > deleted for this problem. >>>>> > >>>>> > We are looking for solutions to delete the jobs safely. >>>>> > If my question was not clear, I am ready to provide more detailed >>>>> explanation. >>>>> > >>>>> > ---- >>>>> > >>>>> > Hello, >>>>> > I encountered an SQLException when I deleted a job with many output >>>>> connections. >>>>> > >>>>> > ERROR 2016-06-02 09:41:49,492 (Document delete thread '9') - Document >>>>> > delete thread aborting and restarting due to database connection >>>>> > reset: Database exception: SQLException doing query (22001): ERROR: >>>>> > value too long for type character varying(64) >>>>> > >>>>> > >>>>> > I've found that the error occurred because of ManifoldCF trying to >>>>> > insert long string (more than 64 characters) to 'activitytype' column >>>>> > of 'repohistory' table while deleting documents associated with the >>>>> > job. >>>>> > >>>>> > For a trial, I altered 'activitytype' column type to 'text' by this >>>>> > sentence. >>>>> > >>>>> > ALTER TABLE repohistory ALTER COLUMN activitytype TYPE text; >>>>> > >>>>> > After altering the table I restarted ManifoldCF then the deletion >>>>> > histories was successfully added and the job seemed to be safely >>>>> > deleted. >>>>> > >>>>> > Inserted 'activitytype' values are like this: >>>>> > document deletion (outputA) (outputB) (outputC) (outputD) (outputE) >>>>> ... >>>>> > >>>>> > For application requirements, I cannot limit the number of output >>>>> > connectors (to shorten history records.) >>>>> > >>>>> > Is that OK? Or there are good solutions for that? >>>>> > >>>>> > Thank you in advance, >>>>> > Tomoko >>>>> >>>> >>>>
