Hi Karl, just back to the issue, I think I solved it in a quick way ( not so much intrusive) :
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector#getDocumentVersions org/apache/manifoldcf/crawler/connectors/sharedrive/SharedDriveConnector.java:706 ... catch ( jcifs.smb.SmbAuthException e ) { Logging.connectors.warn( "JCIFS: Authorization exception reading version information for " + documentIdentifier + " - skipping" ); if(e.getMessage().equals("Logon failure: unknown user name or bad password.")) throw new ManifoldCFException( "SmbAuthException thrown: " + e.getMessage(), e ); else rval[i] = null; } ... In this way the message is checked, and if it is a Login failure we throw the manifoldCFException breaking the iteration ( because login failure means no documents will be accessible but we don't have to erase them) . If it is another Authorization exception ( like permissions changed for the folder/file) the behaviour is the same than before. I think should be enough to be safe, what do you think ? Is any other method affected by this problem ? I think should be limited to the versions check. Cheers 2015-04-02 16:32 GMT+01:00 Alessandro Benedetti <benedetti.ale...@gmail.com> : > > > 2015-04-02 15:58 GMT+01:00 Karl Wright <daddy...@gmail.com>: > >> Hi Alessandro, >> >> Yes, you interpreted my reply correctly. >> >> I think we therefore have to perform any checking operations on the actual >> file being accessed. This is actually pretty easy to do without >> sacrificing performance. All you need to do is the following: >> >> try { >> ... do the file access operation ... >> } catch (SmbException e) { >> ... figure out from the exception whether to throw a ManifoldCFException >> or a ServiceInterruption ... >> ... If the exception does not include enough to distinguish between bad >> credentials and insufficient privs, then do a check RIGHT HERE for bad >> credentials ... >> } >> >> What do you think? The new code would only ever be called if the document >> cannot be read. >> > > I think we can proceed like you said, I am investigating right now the > details returned for the exception ( to understand if there is any > difference between wrong credentials or access denied) > In the case we find the "wrong credential" we have to throw the exception > and stop the iteration ( this will happen the very first time assuming none > is playing server side) . > In this way we save the time of checking all the files ( in the case of > wrong credentials no one will be accessible) . > > Another way can be to do this credential check at the beginning and stop > only if we have wrong credential ( leaving the permission check file by > file) . > > Quite a confused scenario, but we can sort this out with little changes :) > > > >> >> Karl >> >> >> On Thu, Apr 2, 2015 at 10:42 AM, Alessandro Benedetti < >> benedetti.ale...@gmail.com> wrote: >> >> > OkI am currently working on that, and I will work on that next tuesday >> as >> > well . >> > But what about point 2 : >> > " (2) the check itself is >> > specific to the ROOT of the tree, which the user may not have access >> to." >> > >> > I think I got your problem, you mean that a possible scenario can happen >> > when you configure the repository connector with a user that is not >> able >> > to access the root but is able to access the directories we want to >> crawl. >> > In such a case the repository connector will appear to be not able to >> > connect, while the crawling will be still possible if you configure the >> > accessible directories in the job. >> > If this is correct , the situation is more complicated ... >> > >> > Cheers >> > >> > >> > 2015-03-31 16:44 GMT+01:00 Karl Wright <daddy...@gmail.com>: >> > >> > > Hi Alessandro, >> > > >> > > Your code snippet has two problems: (1) it doesn't distinguish between >> > > service interruptions and bad credentials, >> > >> > >> > Should not be the difference between the IOException and the Smb one ? >> > >> > >> > > and (2) the check itself is >> > > specific to the ROOT of the tree, which the user may not have access >> to. >> > > >> > >> > >> > >> > > In check() we can get away with this but if you wire up the check() >> logic >> > > into the crawl processing it will break some people. >> > > >> > > The first problem, (1), is exactly what we need to figure out anyway. >> > > >> > > Karl >> > > >> > > >> > > On Tue, Mar 31, 2015 at 11:30 AM, Alessandro Benedetti < >> > > benedetti.ale...@gmail.com> wrote: >> > > >> > > > Hi karl comments follow : >> > > > >> > > > 2015-03-31 16:18 GMT+01:00 Karl Wright <daddy...@gmail.com>: >> > > > >> > > > > Hi Alessandro, >> > > > > >> > > > > There are situations where the check() method does not succeed but >> > you >> > > > can >> > > > > still crawl. So I would not do it that way, since it >> fundamentally >> > > > changes >> > > > > the contract. >> > > > > >> > > > >> > > > Am I wrong or we should assume the "check()" method to work as it's >> > built >> > > > for. >> > > > I mean if in some case, this method is wrongly implemented , this >> can >> > not >> > > > break another assumption. >> > > > >> > > > > >> > > > > My proposal is to have processDocuments ABORT the job when it >> finds >> > bad >> > > > > credentials. That's very fast and will not permit a job to run >> for a >> > > > long >> > > > > time. >> > > > > >> > > > > The trick is to determine if there are bad credentials WITHOUT >> doing >> > > any >> > > > > more work in the processDocuments pathway than we currently are. >> An >> > > > > exception will be thrown either way, but we need to figure out >> > whether >> > > > > there is any information in the exception that we can use to >> decide >> > > > between >> > > > > bad credentials and no access permissions. >> > > > > >> > > > > You can help provide that by doing a simple experiment on your >> > client's >> > > > > hardware (or yours, if you have such hardware in house). Change >> the >> > > > > credential to an invalid one and see what the exception details >> are. >> > > > Then >> > > > > change to valid credentials and try to crawl a directory that is >> not >> > > > > visible to the credentialed user you supplied, and make a note of >> the >> > > > > exception details in that case too. >> > > > > >> > > > >> > > > I was thinking to slightly modifying the getSession() method adding >> the >> > > > file exist check , something like this : >> > > > >> > > > ... >> > > > >> > > > try >> > > > { >> > > > // use NtlmPasswordAuthentication so that we can reuse >> credential >> > > > for DFS support >> > > > pa = new NtlmPasswordAuthentication( domain, username, password >> ); >> > > > SmbFile smbconnection = new SmbFile( "smb://" + server + "/", >> pa ); >> > > > smbconnectionPath = getFileCanonicalPath( smbconnection ); >> > > > smbconnection.exists(); >> > > > } >> > > > catch ( MalformedURLException e ) >> > > > { >> > > > Logging.connectors.error( >> > > > "Unable to access SMB/CIFS share: " + "smb://" + ( ( domain >> == >> > > > null ) ? "" : domain ) + ";" >> > > > + username + ":<password>@" + server + "/\n" + e ); >> > > > throw new ManifoldCFException( "Unable to access SMB/CIFS >> share: " >> > > > + server, e, >> > > > >> > > > ManifoldCFException.REPOSITORY_CONNECTION_ERROR ); >> > > > } catch (SmbException e) { >> > > > Logging.connectors.error( >> > > > "Unable to access SMB/CIFS share: Credential not valid >> - " >> > > > + "smb://" + ((domain == null) ? "" : domain) + ";" >> > > > + username + ":<password>@" + server + "/\n" + >> e); >> > > > throw new ManifoldCFException( "Unable to access SMB/CIFS share: >> > > > Credential not valid - " + server, e, >> > > > ManifoldCFException.REPOSITORY_CONNECTION_ERROR ); >> > > > } >> > > > >> > > > Catching the smbException should make the trick. >> > > > Anyway I will go more in details. >> > > > >> > > > Cheers >> > > > >> > > > >> > > > > Karl >> > > > > >> > > > > >> > > > > On Tue, Mar 31, 2015 at 10:50 AM, Alessandro Benedetti < >> > > > > benedetti.ale...@gmail.com> wrote: >> > > > > >> > > > > > Currently we are checking each of the String[] oldVersions , >> trying >> > > to >> > > > > > access it ... >> > > > > > So in the scenario I described the current performances are >> quite >> > > > bad... >> > > > > > We would need to avoid at all the scan of the oldDocs if we know >> > the >> > > > > > provided credential are not valid anymore . >> > > > > > >> > > > > > Let me be extreme, but what about not allowing the job to start >> at >> > > all >> > > > if >> > > > > > the Repository Connector is currently broken ( i.e. the >> connection >> > is >> > > > not >> > > > > > working, and we know that because of the check method) . >> > > > > > In this way we avoid to destroy already existent indexes and we >> > > simply >> > > > > > communicate a message in the job giving advice the job can not >> > start >> > > > > > because Repository connector is currently offline ( and showing >> the >> > > > > > explanation) . >> > > > > > >> > > > > > Does this make sense ? >> > > > > > >> > > > > > 2015-03-31 15:30 GMT+01:00 Karl Wright <daddy...@gmail.com>: >> > > > > > >> > > > > > > Hi Alessandro, >> > > > > > > >> > > > > > > If you put a check in the processDocuments method, it will be >> > > called >> > > > > for >> > > > > > > every group of documents. That's fine, but if you structure >> it >> > as >> > > a >> > > > > > > separate call it would impact performance. That is why I >> suggest >> > > > just >> > > > > > > doing a better job of interpreting the existing exceptions. >> > > > > > > >> > > > > > > Karl >> > > > > > > >> > > > > > > >> > > > > > > On Tue, Mar 31, 2015 at 10:27 AM, Alessandro Benedetti < >> > > > > > > benedetti.ale...@gmail.com> wrote: >> > > > > > > >> > > > > > > > As an addition, this should be quite simple, not proceeding >> > with >> > > > the >> > > > > > > > processDocuments method, if the RepositoryConnector is not >> able >> > > to >> > > > > > > connect( >> > > > > > > > check method return not a proper message). >> > > > > > > > >> > > > > > > > Right ? >> > > > > > > > Wondering where is the proper point to enter the action :) >> > > > > > > > >> > > > > > > > Cheers >> > > > > > > > >> > > > > > > > 2015-03-31 14:59 GMT+01:00 Alessandro Benedetti < >> > > > > > > > benedetti.ale...@gmail.com> >> > > > > > > > : >> > > > > > > > >> > > > > > > > > Yes Karl, >> > > > > > > > > I was thinking exactly that, to first check if the >> > credentials >> > > > are >> > > > > > > > valid, >> > > > > > > > > before scanning all the documents. >> > > > > > > > > This because permissions per files depend on users/groups, >> > but >> > > > the >> > > > > > > > current >> > > > > > > > > scenario is not in-validating the user, but invalidating >> the >> > > > access >> > > > > > of >> > > > > > > > that >> > > > > > > > > user. >> > > > > > > > > >> > > > > > > > > An error must be thrown, but the docs not deleted ( not >> even >> > > > > > scanned) . >> > > > > > > > > >> > > > > > > > > Furthermore, what will happen, in the case the server is >> > down ? >> > > > > > > > > Are we safe in that scenario ? >> > > > > > > > > >> > > > > > > > > Cheers >> > > > > > > > > >> > > > > > > > > 2015-03-31 14:42 GMT+01:00 Karl Wright < >> daddy...@gmail.com>: >> > > > > > > > > >> > > > > > > > >> This is actually pretty standard behavior across our >> > connector >> > > > > > family, >> > > > > > > > and >> > > > > > > > >> has been true since Day One. The behavior comes from the >> > > basic >> > > > > > broad >> > > > > > > > >> requirement that the crawler should keep going and skip >> the >> > > > > document >> > > > > > > > when >> > > > > > > > >> the permissions do not allow it to be fetched. With the >> > > Windows >> > > > > > Share >> > > > > > > > >> connector, it's sometimes the case (when DFS is used a >> lot) >> > > that >> > > > > > whole >> > > > > > > > >> subtrees of documents are not fetchable using the >> > credentials >> > > > > > > supplied. >> > > > > > > > >> So >> > > > > > > > >> it is not so easy to just check for valid credentials at >> the >> > > > > > > beginning. >> > > > > > > > >> >> > > > > > > > >> For a solution, I'd be inclined to look for a way to >> figure >> > > out >> > > > if >> > > > > > the >> > > > > > > > >> credentials are actually *invalid*, and abort the job if >> so. >> > > > This >> > > > > > is >> > > > > > > > >> distinct from the case where the credentials are valid >> but >> > the >> > > > > > > connector >> > > > > > > > >> doesn't have permissions to read the document. It will >> take >> > > > some >> > > > > > > > >> experimentation to see if we get back different exception >> > text >> > > > in >> > > > > > the >> > > > > > > > two >> > > > > > > > >> situations. >> > > > > > > > >> >> > > > > > > > >> Karl >> > > > > > > > >> >> > > > > > > > >> >> > > > > > > > >> On Tue, Mar 31, 2015 at 9:30 AM, Alessandro Benedetti < >> > > > > > > > >> abenede...@apache.org >> > > > > > > > >> > wrote: >> > > > > > > > >> >> > > > > > > > >> > Hi guys, >> > > > > > > > >> > playing with the Windows Shares Connector in ManifoldCF >> > 1.8 >> > > I >> > > > > > > > >> encountered >> > > > > > > > >> > this problem : >> > > > > > > > >> > >> > > > > > > > >> > *Scenario* >> > > > > > > > >> > *1)* Indexing windows Shares server >> > > > > > > > >> > *2)* Indexing successfully finished with N docs indexed >> > > > > > > > >> > *3)* Offline ,while no indexing is happening, Shares >> > server >> > > > > side, >> > > > > > > the >> > > > > > > > >> > Administrator password changes >> > > > > > > > >> > *4) *Repository Connector is not able to connect >> > anymore(of >> > > > > course >> > > > > > > > >> because >> > > > > > > > >> > the password has changed) >> > > > > > > > >> > *5)* Next indexing cycle, ALL docs are removed from the >> > > index >> > > > . >> > > > > > > > >> > >> > > > > > > > >> > *Expected Behaviour* >> > > > > > > > >> > As I user I would like to see an error message, that >> will >> > > let >> > > > me >> > > > > > > > >> understand >> > > > > > > > >> > the issue, not losing all my N indexed docs . >> > > > > > > > >> > >> > > > > > > > >> > *Reason* >> > > > > > > > >> > Taking a look into the code, the problems seems to be >> in >> > > the : >> > > > > > > > >> > >> > > > > > > > >> > >> > > > > > > > >> >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector#getDocumentVersions >> > > > > > > > >> > where it tries to access each document singularly >> through >> > > > Samba, >> > > > > > and >> > > > > > > > >> > removing them one by one if not reachable anymore. >> > > > > > > > >> > >> > > > > > > > >> > *Solution* >> > > > > > > > >> > Before scanning each document, we have to be sure the >> > > > connection >> > > > > > is >> > > > > > > > >> > working. >> > > > > > > > >> > If not this is only armful. >> > > > > > > > >> > >> > > > > > > > >> > I will continue investigating, but I would like your >> > opinion >> > > > as >> > > > > > well >> > > > > > > > >> > >> > > > > > > > >> > Cheers >> > > > > > > > >> > >> > > > > > > > >> > >> > > > > > > > >> > >> > > > > > > > >> > >> > > > > > > > >> > >> > > > > > > > >> > >> > > > > > > > >> > -- >> > > > > > > > >> > -------------------------- >> > > > > > > > >> > >> > > > > > > > >> > Benedetti Alessandro >> > > > > > > > >> > Visiting card : http://about.me/alessandro_benedetti >> > > > > > > > >> > >> > > > > > > > >> > "Tyger, tyger burning bright >> > > > > > > > >> > In the forests of the night, >> > > > > > > > >> > What immortal hand or eye >> > > > > > > > >> > Could frame thy fearful symmetry?" >> > > > > > > > >> > >> > > > > > > > >> > William Blake - Songs of Experience -1794 England >> > > > > > > > >> > >> > > > > > > > >> >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > -- >> > > > > > > > > -------------------------- >> > > > > > > > > >> > > > > > > > > Benedetti Alessandro >> > > > > > > > > Visiting card : http://about.me/alessandro_benedetti >> > > > > > > > > >> > > > > > > > > "Tyger, tyger burning bright >> > > > > > > > > In the forests of the night, >> > > > > > > > > What immortal hand or eye >> > > > > > > > > Could frame thy fearful symmetry?" >> > > > > > > > > >> > > > > > > > > William Blake - Songs of Experience -1794 England >> > > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > -- >> > > > > > > > -------------------------- >> > > > > > > > >> > > > > > > > Benedetti Alessandro >> > > > > > > > Visiting card : http://about.me/alessandro_benedetti >> > > > > > > > >> > > > > > > > "Tyger, tyger burning bright >> > > > > > > > In the forests of the night, >> > > > > > > > What immortal hand or eye >> > > > > > > > Could frame thy fearful symmetry?" >> > > > > > > > >> > > > > > > > William Blake - Songs of Experience -1794 England >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > -- >> > > > > > -------------------------- >> > > > > > >> > > > > > Benedetti Alessandro >> > > > > > Visiting card : http://about.me/alessandro_benedetti >> > > > > > >> > > > > > "Tyger, tyger burning bright >> > > > > > In the forests of the night, >> > > > > > What immortal hand or eye >> > > > > > Could frame thy fearful symmetry?" >> > > > > > >> > > > > > William Blake - Songs of Experience -1794 England >> > > > > > >> > > > > >> > > > >> > > > >> > > > >> > > > -- >> > > > -------------------------- >> > > > >> > > > Benedetti Alessandro >> > > > Visiting card : http://about.me/alessandro_benedetti >> > > > >> > > > "Tyger, tyger burning bright >> > > > In the forests of the night, >> > > > What immortal hand or eye >> > > > Could frame thy fearful symmetry?" >> > > > >> > > > William Blake - Songs of Experience -1794 England >> > > > >> > > >> > >> > >> > >> > -- >> > -------------------------- >> > >> > Benedetti Alessandro >> > Visiting card : http://about.me/alessandro_benedetti >> > >> > "Tyger, tyger burning bright >> > In the forests of the night, >> > What immortal hand or eye >> > Could frame thy fearful symmetry?" >> > >> > William Blake - Songs of Experience -1794 England >> > >> > > > > -- > -------------------------- > > Benedetti Alessandro > Visiting card : http://about.me/alessandro_benedetti > > "Tyger, tyger burning bright > In the forests of the night, > What immortal hand or eye > Could frame thy fearful symmetry?" > > William Blake - Songs of Experience -1794 England > -- -------------------------- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti "Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry?" William Blake - Songs of Experience -1794 England