Currently we are checking each of the String[] oldVersions , trying to access it ... So in the scenario I described the current performances are quite bad... We would need to avoid at all the scan of the oldDocs if we know the provided credential are not valid anymore .
Let me be extreme, but what about not allowing the job to start at all if the Repository Connector is currently broken ( i.e. the connection is not working, and we know that because of the check method) . In this way we avoid to destroy already existent indexes and we simply communicate a message in the job giving advice the job can not start because Repository connector is currently offline ( and showing the explanation) . Does this make sense ? 2015-03-31 15:30 GMT+01:00 Karl Wright <daddy...@gmail.com>: > Hi Alessandro, > > If you put a check in the processDocuments method, it will be called for > every group of documents. That's fine, but if you structure it as a > separate call it would impact performance. That is why I suggest just > doing a better job of interpreting the existing exceptions. > > Karl > > > On Tue, Mar 31, 2015 at 10:27 AM, Alessandro Benedetti < > benedetti.ale...@gmail.com> wrote: > > > As an addition, this should be quite simple, not proceeding with the > > processDocuments method, if the RepositoryConnector is not able to > connect( > > check method return not a proper message). > > > > Right ? > > Wondering where is the proper point to enter the action :) > > > > Cheers > > > > 2015-03-31 14:59 GMT+01:00 Alessandro Benedetti < > > benedetti.ale...@gmail.com> > > : > > > > > Yes Karl, > > > I was thinking exactly that, to first check if the credentials are > > valid, > > > before scanning all the documents. > > > This because permissions per files depend on users/groups, but the > > current > > > scenario is not in-validating the user, but invalidating the access of > > that > > > user. > > > > > > An error must be thrown, but the docs not deleted ( not even scanned) . > > > > > > Furthermore, what will happen, in the case the server is down ? > > > Are we safe in that scenario ? > > > > > > Cheers > > > > > > 2015-03-31 14:42 GMT+01:00 Karl Wright <daddy...@gmail.com>: > > > > > >> This is actually pretty standard behavior across our connector family, > > and > > >> has been true since Day One. The behavior comes from the basic broad > > >> requirement that the crawler should keep going and skip the document > > when > > >> the permissions do not allow it to be fetched. With the Windows Share > > >> connector, it's sometimes the case (when DFS is used a lot) that whole > > >> subtrees of documents are not fetchable using the credentials > supplied. > > >> So > > >> it is not so easy to just check for valid credentials at the > beginning. > > >> > > >> For a solution, I'd be inclined to look for a way to figure out if the > > >> credentials are actually *invalid*, and abort the job if so. This is > > >> distinct from the case where the credentials are valid but the > connector > > >> doesn't have permissions to read the document. It will take some > > >> experimentation to see if we get back different exception text in the > > two > > >> situations. > > >> > > >> Karl > > >> > > >> > > >> On Tue, Mar 31, 2015 at 9:30 AM, Alessandro Benedetti < > > >> abenede...@apache.org > > >> > wrote: > > >> > > >> > Hi guys, > > >> > playing with the Windows Shares Connector in ManifoldCF 1.8 I > > >> encountered > > >> > this problem : > > >> > > > >> > *Scenario* > > >> > *1)* Indexing windows Shares server > > >> > *2)* Indexing successfully finished with N docs indexed > > >> > *3)* Offline ,while no indexing is happening, Shares server side, > the > > >> > Administrator password changes > > >> > *4) *Repository Connector is not able to connect anymore(of course > > >> because > > >> > the password has changed) > > >> > *5)* Next indexing cycle, ALL docs are removed from the index . > > >> > > > >> > *Expected Behaviour* > > >> > As I user I would like to see an error message, that will let me > > >> understand > > >> > the issue, not losing all my N indexed docs . > > >> > > > >> > *Reason* > > >> > Taking a look into the code, the problems seems to be in the : > > >> > > > >> > > > >> > > > org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector#getDocumentVersions > > >> > where it tries to access each document singularly through Samba, and > > >> > removing them one by one if not reachable anymore. > > >> > > > >> > *Solution* > > >> > Before scanning each document, we have to be sure the connection is > > >> > working. > > >> > If not this is only armful. > > >> > > > >> > I will continue investigating, but I would like your opinion as well > > >> > > > >> > Cheers > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > -- > > >> > -------------------------- > > >> > > > >> > Benedetti Alessandro > > >> > Visiting card : http://about.me/alessandro_benedetti > > >> > > > >> > "Tyger, tyger burning bright > > >> > In the forests of the night, > > >> > What immortal hand or eye > > >> > Could frame thy fearful symmetry?" > > >> > > > >> > William Blake - Songs of Experience -1794 England > > >> > > > >> > > > > > > > > > > > > -- > > > -------------------------- > > > > > > Benedetti Alessandro > > > Visiting card : http://about.me/alessandro_benedetti > > > > > > "Tyger, tyger burning bright > > > In the forests of the night, > > > What immortal hand or eye > > > Could frame thy fearful symmetry?" > > > > > > William Blake - Songs of Experience -1794 England > > > > > > > > > > > -- > > -------------------------- > > > > Benedetti Alessandro > > Visiting card : http://about.me/alessandro_benedetti > > > > "Tyger, tyger burning bright > > In the forests of the night, > > What immortal hand or eye > > Could frame thy fearful symmetry?" > > > > William Blake - Songs of Experience -1794 England > > > -- -------------------------- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti "Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry?" William Blake - Songs of Experience -1794 England