Yes Karl, I was thinking exactly that, to first check if the credentials are valid, before scanning all the documents. This because permissions per files depend on users/groups, but the current scenario is not in-validating the user, but invalidating the access of that user.
An error must be thrown, but the docs not deleted ( not even scanned) . Furthermore, what will happen, in the case the server is down ? Are we safe in that scenario ? Cheers 2015-03-31 14:42 GMT+01:00 Karl Wright <daddy...@gmail.com>: > This is actually pretty standard behavior across our connector family, and > has been true since Day One. The behavior comes from the basic broad > requirement that the crawler should keep going and skip the document when > the permissions do not allow it to be fetched. With the Windows Share > connector, it's sometimes the case (when DFS is used a lot) that whole > subtrees of documents are not fetchable using the credentials supplied. So > it is not so easy to just check for valid credentials at the beginning. > > For a solution, I'd be inclined to look for a way to figure out if the > credentials are actually *invalid*, and abort the job if so. This is > distinct from the case where the credentials are valid but the connector > doesn't have permissions to read the document. It will take some > experimentation to see if we get back different exception text in the two > situations. > > Karl > > > On Tue, Mar 31, 2015 at 9:30 AM, Alessandro Benedetti < > abenede...@apache.org > > wrote: > > > Hi guys, > > playing with the Windows Shares Connector in ManifoldCF 1.8 I encountered > > this problem : > > > > *Scenario* > > *1)* Indexing windows Shares server > > *2)* Indexing successfully finished with N docs indexed > > *3)* Offline ,while no indexing is happening, Shares server side, the > > Administrator password changes > > *4) *Repository Connector is not able to connect anymore(of course > because > > the password has changed) > > *5)* Next indexing cycle, ALL docs are removed from the index . > > > > *Expected Behaviour* > > As I user I would like to see an error message, that will let me > understand > > the issue, not losing all my N indexed docs . > > > > *Reason* > > Taking a look into the code, the problems seems to be in the : > > > > > org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector#getDocumentVersions > > where it tries to access each document singularly through Samba, and > > removing them one by one if not reachable anymore. > > > > *Solution* > > Before scanning each document, we have to be sure the connection is > > working. > > If not this is only armful. > > > > I will continue investigating, but I would like your opinion as well > > > > Cheers > > > > > > > > > > > > > > -- > > -------------------------- > > > > Benedetti Alessandro > > Visiting card : http://about.me/alessandro_benedetti > > > > "Tyger, tyger burning bright > > In the forests of the night, > > What immortal hand or eye > > Could frame thy fearful symmetry?" > > > > William Blake - Songs of Experience -1794 England > > > -- -------------------------- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti "Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry?" William Blake - Songs of Experience -1794 England