Sorry, I wasn't clear.

The dataManager initialization can certainly occur in every
setThreadContext method call because that's when you have the
threadContext available.  But as long as you save the thread context
when setThreadContext is called, you can certainly do it in
getSession() instead - perfectly OK.  But setting up dataManager at
setThreadContext time is a fine way to go and ties this occurrence
directly to setThreadContext, which is probably a bit better.

Now, getSession() is typically used for a different purpose - namely
to set up a connection to your target repository which has some
lifetime.  Thus if (say) you need an HttpClient object, creating that
object (or grabbing it from some HttpClient pool) probably should take
place in getSession().  Since we've stipulated that the HttpClient
object should be released back into the pool when idle, you should
also keep track of the time (in getSession()) when the HttpClient was
last needed, so you know when it should be released. during the poll()
operation.

But I was basically making the case that setThreadContext() operates
on a different time scale than getSession().  setThreadContext() is
tied to when the class instance is pulled out of the class instance
pool, but there is nothing in the contract that says that ManifoldCF
can't call multiple connector methods with the same thread.  So a
connector that expects setThreadContext() to be called before every
addOrReplaceDocument() is making a mistake.  setThreadContext() only
will be called when the connector instance is pulled from the
connector instance pool, which must occur before addOrReplaceDocument
is called, but many many connector class instance invocations may
occur then.

The book describes this pictorially in figure 6.3.  You just need to
imagine that every time a thread obtains a connection handle it may
well hold onto it for an extended period of time.

Karl

On Tue, Jun 7, 2011 at 8:49 PM, Farzad Valad <[email protected]> wrote:
> I think I get it now why you said put getSession in addOrReplaceDocument.
>  This way you construct the dataManager when you need it as oppose to each
> set and clear pair : )
>
> On 6/7/2011 5:10 PM, Farzad Valad wrote:
>>
>> I don't fully understand how a connector instance can be used by multiple
>> threads without each thread calling setThread.  Here is what I think I know.
>>
>> The contract does say that addOrReplaceDocument is only called after
>> setTC, right?  Because you first have to have connection handle before a
>> particular manifold thread can use it.  So if I create my dataManager in
>> set, all will be well in addOrReplaceDocument.  The other caveat is that
>> I'll make dataManager a class variable, instead of static.  So each object
>> would have its own instance with its TC, and in clearTC they'd be nulling
>> their version an not anyone else's.
>>
>> Do I get it?
>>
>> On 6/7/2011 5:00 PM, Karl Wright wrote:
>>>
>>> The recommendation to have getSession be called in
>>> addOrReplaceDocument is because there is nothing in the contract which
>>> states that the connector instance will switch threads between calls.
>>> Therefore there is no guarantee that
>>> clearThreadContext/setThreadContext will be called right prior to
>>> addOrReplaceDocument.  The two aspects of the interface are therefore
>>> independent of one another, and it would be poor coding to presume
>>> that you could assume something in the contract that was not there.
>>>
>>> Karl
>>>
>>> On Tue, Jun 7, 2011 at 5:52 PM, Farzad Valad<[email protected]>  wrote:
>>>>
>>>> Thanks for the confirmation.  So if I have code to set dataManager to
>>>> null
>>>> in clearThreadContext and create a dataManager in setThreadContext.  Why
>>>> do
>>>> I need the getSession method in addOrReplaceDocument method?  From what
>>>> I
>>>> learnt about ManifoldCF architecture, setThreadContext will get called
>>>> before addOrReplaceDocument.  This was something you recommended when I
>>>> was
>>>> asking about the third party repository.
>>>>
>>>> Farzad.
>>>>
>>>> On 6/7/2011 4:35 PM, Karl Wright wrote:
>>>>>
>>>>> It sounds like you are on the right track for fixing all of these
>>>>> problems.
>>>>>
>>>>> Karl
>>>>>
>>>>> On Tue, Jun 7, 2011 at 4:38 PM, Farzad Valad<[email protected]>
>>>>>  wrote:
>>>>>>
>>>>>> I think I found the problem.  I should be tearing down the dataManager
>>>>>> and
>>>>>> recreating it between clear and set thread context calls, because it
>>>>>> has
>>>>>> a
>>>>>> thread context.  I'm not doing that.  I guess I did learn something
>>>>>> reading
>>>>>> : ) let me know if you believe otherwise.  Also do you think this is
>>>>>> why
>>>>>> the
>>>>>> bad transaction id is happening?  Thanks!
>>>>>>
>>>>>>            IDBInterface databaseHandle =
>>>>>> DBInterfaceFactory.make(currentContext,
>>>>>> ManifoldCF.getMasterDatabaseName(),
>>>>>> ManifoldCF.getMasterDatabaseUsername(),
>>>>>> ManifoldCF.getMasterDatabasePassword());
>>>>>>            dataManager = new DataManager(currentContext,
>>>>>> databaseHandle);
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 6/7/2011 12:42 PM, Farzad Valad wrote:
>>>>>>>
>>>>>>> So I think I figured it out.  For some reason I'm getting a db error,
>>>>>>> bad
>>>>>>> transaction id, which then kills my dataManager object, or I should
>>>>>>> say
>>>>>>> the
>>>>>>> framework is setting it to null.  What does a Bad transaction ID
>>>>>>> mean?
>>>>>>>  Thoughts?  This happened after I did a LockClean and restart both
>>>>>>> the
>>>>>>> agent
>>>>>>> and Tomcat.  Thanks, Farzad.
>>>>>>>
>>>>>>> ERROR 2011-06-07 11:44:56,365 [Worker thread '90']
>>>>>>> (CacheManager.java:621)
>>>>>>> - Thread[Worker thread '90',5,main]: invalidateKeys: 1307465096157:
>>>>>>> org.apache.manifoldcf.core.cachemanager.CacheManager@13b0c258:
>>>>>>> Transaction
>>>>>>> hash =
>>>>>>>
>>>>>>>
>>>>>>> {1307465096144=org.apache.manifoldcf.core.cachemanager.CacheManager$CacheTransactionHandle@39a72981}
>>>>>>> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Bad
>>>>>>> transaction
>>>>>>> ID!
>>>>>>>    at
>>>>>>>
>>>>>>>
>>>>>>> org.apache.manifoldcf.core.cachemanager.CacheManager.invalidateKeys(CacheManager.java:620)
>>>>>>>    at
>>>>>>>
>>>>>>>
>>>>>>> org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:175)
>>>>>>>    at
>>>>>>>
>>>>>>>
>>>>>>> org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:168)
>>>>>>>    at
>>>>>>>
>>>>>>>
>>>>>>> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performModification(DBInterfacePostgreSQL.java:637)
>>>>>>>    at
>>>>>>>
>>>>>>>
>>>>>>> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performInsert(DBInterfacePostgreSQL.java:191)
>>>>>>>    at
>>>>>>>
>>>>>>>
>>>>>>> org.apache.manifoldcf.core.database.BaseTable.performInsert(BaseTable.java:76)
>>>>>>>    at
>>>>>>>
>>>>>>>
>>>>>>> org.apache.manifoldcf.agents.output.dupfinder.DataManager.insertData(DataManager.java:115)
>>>>>>>    at
>>>>>>>
>>>>>>>
>>>>>>> org.apache.manifoldcf.agents.output.dupfinder.DupFinderConnector.addOrReplaceDocument(DupFinderConnector.java:158)
>>>>>>>    at
>>>>>>>
>>>>>>>
>>>>>>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.addOrReplaceDocument(IncrementalIngester.java:1433)
>>>>>>>    at
>>>>>>>
>>>>>>>
>>>>>>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.performIngestion(IncrementalIngester.java:418)
>>>>>>>    at
>>>>>>>
>>>>>>>
>>>>>>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:313)
>>>>>>>    at
>>>>>>>
>>>>>>>
>>>>>>> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocument(WorkerThread.java:1565)
>>>>>>>    at
>>>>>>>
>>>>>>>
>>>>>>> org.apache.manifoldcf.crawler.connectors.filesystem.FileConnector.processDocuments(FileConnector.java:275)
>>>>>>>    at
>>>>>>>
>>>>>>>
>>>>>>> org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector.processDocuments(BaseRepositoryConnector.java:423)
>>>>>>>    at
>>>>>>>
>>>>>>>
>>>>>>> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:564)
>>>>>>> FATAL 2011-06-07 11:44:56,583 [Worker thread '32']
>>>>>>> (DupFinderConnector.java:155) - DATAMANAGER IS NULL!!!!
>>>>>>> ERROR 2011-06-07 11:44:56,599 [Worker thread '90']
>>>>>>> (WorkerThread.java:893)
>>>>>>> - Exception tossed: Bad transaction ID!
>>>>>>> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Bad
>>>>>>> transaction
>>>>>>> ID!
>>>>>>>    at
>>>>>>>
>>>>>>>
>>>>>>> org.apache.manifoldcf.core.cachemanager.CacheManager.invalidateKeys(CacheManager.java:620)
>>>>>>>    at
>>>>>>>
>>>>>>>
>>>>>>> org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:175)
>>>>>>>    at
>>>>>>>
>>>>>>>
>>>>>>> org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:168)
>>>>>>>    at
>>>>>>>
>>>>>>>
>>>>>>> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performModification(DBInterfacePostgreSQL.java:637)
>>>>>>>    at
>>>>>>>
>>>>>>>
>>>>>>> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performInsert(DBInterfacePostgreSQL.java:191)
>>>>>>>    at
>>>>>>>
>>>>>>>
>>>>>>> org.apache.manifoldcf.core.database.BaseTable.performInsert(BaseTable.java:76)
>>>>>>>    at
>>>>>>>
>>>>>>>
>>>>>>> org.apache.manifoldcf.agents.output.dupfinder.DataManager.insertData(DataManager.java:115)
>>>>>>>    at
>>>>>>>
>>>>>>>
>>>>>>> org.apache.manifoldcf.agents.output.dupfinder.DupFinderConnector.addOrReplaceDocument(DupFinderConnector.java:158)
>>>>>>>    at
>>>>>>>
>>>>>>>
>>>>>>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.addOrReplaceDocument(IncrementalIngester.java:1433)
>>>>>>>    at
>>>>>>>
>>>>>>>
>>>>>>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.performIngestion(IncrementalIngester.java:418)
>>>>>>>    at
>>>>>>>
>>>>>>>
>>>>>>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:313)
>>>>>>>    at
>>>>>>>
>>>>>>>
>>>>>>> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocument(WorkerThread.java:1565)
>>>>>>>    at
>>>>>>>
>>>>>>>
>>>>>>> org.apache.manifoldcf.crawler.connectors.filesystem.FileConnector.processDocuments(FileConnector.java:275)
>>>>>>>    at
>>>>>>>
>>>>>>>
>>>>>>> org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector.processDocuments(BaseRepositoryConnector.java:423)
>>>>>>>    at
>>>>>>>
>>>>>>>
>>>>>>> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:564)
>>>>>>> FATAL 2011-06-07 11:44:56,614 [Worker thread '32']
>>>>>>> (WorkerThread.java:955)
>>>>>>> - Error tossed: null
>>>>>>> java.lang.NullPointerException
>>>>>>>    at
>>>>>>>
>>>>>>>
>>>>>>> org.apache.manifoldcf.agents.output.dupfinder.DupFinderConnector.addOrReplaceDocument(DupFinderConnector.java:158)
>>>>>>>    at
>>>>>>>
>>>>>>>
>>>>>>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.addOrReplaceDocument(IncrementalIngester.java:1433)
>>>>>>>    at
>>>>>>>
>>>>>>>
>>>>>>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.performIngestion(IncrementalIngester.java:418)
>>>>>>>    at
>>>>>>>
>>>>>>>
>>>>>>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:313)
>>>>>>>    at
>>>>>>>
>>>>>>>
>>>>>>> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocument(WorkerThread.java:1565)
>>>>>>>    at
>>>>>>>
>>>>>>>
>>>>>>> org.apache.manifoldcf.crawler.connectors.filesystem.FileConnector.processDocuments(FileConnector.java:275)
>>>>>>>    at
>>>>>>>
>>>>>>>
>>>>>>> org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector.processDocuments(BaseRepositoryConnector.java:423)
>>>>>>>    at
>>>>>>>
>>>>>>>
>>>>>>> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:564)
>>>>>>>  INFO 2011-06-07 11:44:56,645 [Worker thread '92']
>>>>>>> (DupFinderConnector.java:251) - Attempting to initialize
>>>>>>> dataManager(null)
>>>>>>> and ciConnector(null)
>>>>>>>
>>>>>>>
>>>>>>> On 6/7/2011 9:20 AM, Farzad Valad wrote:
>>>>>>>>
>>>>>>>> Lately when I issue an abort on a crawl job (click abort in UI), it
>>>>>>>> gets
>>>>>>>> stuck, meaning the UI doesn't show any new info on subsequent
>>>>>>>> refreshes.  It
>>>>>>>> just says Aborting, the start time, no end time, shows # of
>>>>>>>> documents,
>>>>>>>> active, and processed.  I restarted Tomcat, but still stuck in
>>>>>>>> Aborting
>>>>>>>> state.  Restarting the Agent process doesn't have any affect.  But
>>>>>>>> now
>>>>>>>> if
>>>>>>>> you kill the agent process and issue lock clean, then start the
>>>>>>>> Agent
>>>>>>>> Process, it will show an Error in the Status column, but no end
>>>>>>>> time.
>>>>>>>>  Ironically, this time the problem was a bad transaction id.  The
>>>>>>>> last
>>>>>>>> time
>>>>>>>> it was a connection refusal to my repository.  Thoughts?
>>>>>>>>
>>>>>>>> PS.  Previous problem, you were right, dataManager is going null for
>>>>>>>> some
>>>>>>>> reason, actually debugging for dataManager I ran into this one : )
>>>>>>>>
>>>>>>>> ERROR 2011-06-07 08:50:01,416 [Worker thread '64']
>>>>>>>> (CacheManager.java:621) - Thread[Worker thread '64',5,main]:
>>>>>>>> invalidateKeys:
>>>>>>>> 1307454600471:
>>>>>>>> org.apache.manifoldcf.core.cachemanager.CacheManager@39d7af3:
>>>>>>>> Transaction hash = {}
>>>>>>>> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Bad
>>>>>>>> transaction ID!
>>>>>>>>    at
>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.manifoldcf.core.cachemanager.CacheManager.invalidateKeys(CacheManager.java:620)
>>>>>>>>    at
>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:175)
>>>>>>>>    at
>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:168)
>>>>>>>>    at
>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performModification(DBInterfacePostgreSQL.java:637)
>>>>>>>>    at
>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performInsert(DBInterfacePostgreSQL.java:191)
>>>>>>>>    at
>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.manifoldcf.core.database.BaseTable.performInsert(BaseTable.java:76)
>>>>>>>>    at
>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.manifoldcf.agents.output.dupfinder.DataManager.insertData(DataManager.java:115)
>>>>>>>>    at
>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.manifoldcf.agents.output.dupfinder.DupFinderConnector.addOrReplaceDocument(DupFinderConnector.java:162)
>>>>>>>>    at
>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.addOrReplaceDocument(IncrementalIngester.java:1433)
>>>>>>>>    at
>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.performIngestion(IncrementalIngester.java:418)
>>>>>>>>    at
>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:313)
>>>>>>>>    at
>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocument(WorkerThread.java:1565)
>>>>>>>>    at
>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.manifoldcf.crawler.connectors.filesystem.FileConnector.processDocuments(FileConnector.java:275)
>>>>>>>>    at
>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector.processDocuments(BaseRepositoryConnector.java:423)
>>>>>>>>    at
>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:564)
>>>>>>>> ERROR 2011-06-07 08:50:01,510 [Worker thread '64']
>>>>>>>> (WorkerThread.java:893) - Exception tossed: Bad transaction ID!
>>>>>>>> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Bad
>>>>>>>> transaction ID!
>>>>>>>>    at
>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.manifoldcf.core.cachemanager.CacheManager.invalidateKeys(CacheManager.java:620)
>>>>>>>>    at
>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:175)
>>>>>>>>    at
>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:168)
>>>>>>>>    at
>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performModification(DBInterfacePostgreSQL.java:637)
>>>>>>>>    at
>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performInsert(DBInterfacePostgreSQL.java:191)
>>>>>>>>    at
>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.manifoldcf.core.database.BaseTable.performInsert(BaseTable.java:76)
>>>>>>>>    at
>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.manifoldcf.agents.output.dupfinder.DataManager.insertData(DataManager.java:115)
>>>>>>>>    at
>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.manifoldcf.agents.output.dupfinder.DupFinderConnector.addOrReplaceDocument(DupFinderConnector.java:162)
>>>>>>>>    at
>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.addOrReplaceDocument(IncrementalIngester.java:1433)
>>>>>>>>    at
>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.performIngestion(IncrementalIngester.java:418)
>>>>>>>>    at
>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:313)
>>>>>>>>    at
>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocument(WorkerThread.java:1565)
>>>>>>>>    at
>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.manifoldcf.crawler.connectors.filesystem.FileConnector.processDocuments(FileConnector.java:275)
>>>>>>>>    at
>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector.processDocuments(BaseRepositoryConnector.java:423)
>>>>>>>>    at
>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:564)
>>>>>>>>
>>>>
>>
>
>

Reply via email to