Hi karl comments follow : 2015-03-31 16:18 GMT+01:00 Karl Wright <daddy...@gmail.com>:
> Hi Alessandro, > > There are situations where the check() method does not succeed but you can > still crawl. So I would not do it that way, since it fundamentally changes > the contract. > Am I wrong or we should assume the "check()" method to work as it's built for. I mean if in some case, this method is wrongly implemented , this can not break another assumption. > > My proposal is to have processDocuments ABORT the job when it finds bad > credentials. That's very fast and will not permit a job to run for a long > time. > > The trick is to determine if there are bad credentials WITHOUT doing any > more work in the processDocuments pathway than we currently are. An > exception will be thrown either way, but we need to figure out whether > there is any information in the exception that we can use to decide between > bad credentials and no access permissions. > > You can help provide that by doing a simple experiment on your client's > hardware (or yours, if you have such hardware in house). Change the > credential to an invalid one and see what the exception details are. Then > change to valid credentials and try to crawl a directory that is not > visible to the credentialed user you supplied, and make a note of the > exception details in that case too. > I was thinking to slightly modifying the getSession() method adding the file exist check , something like this : ... try { // use NtlmPasswordAuthentication so that we can reuse credential for DFS support pa = new NtlmPasswordAuthentication( domain, username, password ); SmbFile smbconnection = new SmbFile( "smb://" + server + "/", pa ); smbconnectionPath = getFileCanonicalPath( smbconnection ); smbconnection.exists(); } catch ( MalformedURLException e ) { Logging.connectors.error( "Unable to access SMB/CIFS share: " + "smb://" + ( ( domain == null ) ? "" : domain ) + ";" + username + ":<password>@" + server + "/\n" + e ); throw new ManifoldCFException( "Unable to access SMB/CIFS share: " + server, e, ManifoldCFException.REPOSITORY_CONNECTION_ERROR ); } catch (SmbException e) { Logging.connectors.error( "Unable to access SMB/CIFS share: Credential not valid - " + "smb://" + ((domain == null) ? "" : domain) + ";" + username + ":<password>@" + server + "/\n" + e); throw new ManifoldCFException( "Unable to access SMB/CIFS share: Credential not valid - " + server, e, ManifoldCFException.REPOSITORY_CONNECTION_ERROR ); } Catching the smbException should make the trick. Anyway I will go more in details. Cheers > Karl > > > On Tue, Mar 31, 2015 at 10:50 AM, Alessandro Benedetti < > benedetti.ale...@gmail.com> wrote: > > > Currently we are checking each of the String[] oldVersions , trying to > > access it ... > > So in the scenario I described the current performances are quite bad... > > We would need to avoid at all the scan of the oldDocs if we know the > > provided credential are not valid anymore . > > > > Let me be extreme, but what about not allowing the job to start at all if > > the Repository Connector is currently broken ( i.e. the connection is not > > working, and we know that because of the check method) . > > In this way we avoid to destroy already existent indexes and we simply > > communicate a message in the job giving advice the job can not start > > because Repository connector is currently offline ( and showing the > > explanation) . > > > > Does this make sense ? > > > > 2015-03-31 15:30 GMT+01:00 Karl Wright <daddy...@gmail.com>: > > > > > Hi Alessandro, > > > > > > If you put a check in the processDocuments method, it will be called > for > > > every group of documents. That's fine, but if you structure it as a > > > separate call it would impact performance. That is why I suggest just > > > doing a better job of interpreting the existing exceptions. > > > > > > Karl > > > > > > > > > On Tue, Mar 31, 2015 at 10:27 AM, Alessandro Benedetti < > > > benedetti.ale...@gmail.com> wrote: > > > > > > > As an addition, this should be quite simple, not proceeding with the > > > > processDocuments method, if the RepositoryConnector is not able to > > > connect( > > > > check method return not a proper message). > > > > > > > > Right ? > > > > Wondering where is the proper point to enter the action :) > > > > > > > > Cheers > > > > > > > > 2015-03-31 14:59 GMT+01:00 Alessandro Benedetti < > > > > benedetti.ale...@gmail.com> > > > > : > > > > > > > > > Yes Karl, > > > > > I was thinking exactly that, to first check if the credentials are > > > > valid, > > > > > before scanning all the documents. > > > > > This because permissions per files depend on users/groups, but the > > > > current > > > > > scenario is not in-validating the user, but invalidating the access > > of > > > > that > > > > > user. > > > > > > > > > > An error must be thrown, but the docs not deleted ( not even > > scanned) . > > > > > > > > > > Furthermore, what will happen, in the case the server is down ? > > > > > Are we safe in that scenario ? > > > > > > > > > > Cheers > > > > > > > > > > 2015-03-31 14:42 GMT+01:00 Karl Wright <daddy...@gmail.com>: > > > > > > > > > >> This is actually pretty standard behavior across our connector > > family, > > > > and > > > > >> has been true since Day One. The behavior comes from the basic > > broad > > > > >> requirement that the crawler should keep going and skip the > document > > > > when > > > > >> the permissions do not allow it to be fetched. With the Windows > > Share > > > > >> connector, it's sometimes the case (when DFS is used a lot) that > > whole > > > > >> subtrees of documents are not fetchable using the credentials > > > supplied. > > > > >> So > > > > >> it is not so easy to just check for valid credentials at the > > > beginning. > > > > >> > > > > >> For a solution, I'd be inclined to look for a way to figure out if > > the > > > > >> credentials are actually *invalid*, and abort the job if so. This > > is > > > > >> distinct from the case where the credentials are valid but the > > > connector > > > > >> doesn't have permissions to read the document. It will take some > > > > >> experimentation to see if we get back different exception text in > > the > > > > two > > > > >> situations. > > > > >> > > > > >> Karl > > > > >> > > > > >> > > > > >> On Tue, Mar 31, 2015 at 9:30 AM, Alessandro Benedetti < > > > > >> abenede...@apache.org > > > > >> > wrote: > > > > >> > > > > >> > Hi guys, > > > > >> > playing with the Windows Shares Connector in ManifoldCF 1.8 I > > > > >> encountered > > > > >> > this problem : > > > > >> > > > > > >> > *Scenario* > > > > >> > *1)* Indexing windows Shares server > > > > >> > *2)* Indexing successfully finished with N docs indexed > > > > >> > *3)* Offline ,while no indexing is happening, Shares server > side, > > > the > > > > >> > Administrator password changes > > > > >> > *4) *Repository Connector is not able to connect anymore(of > course > > > > >> because > > > > >> > the password has changed) > > > > >> > *5)* Next indexing cycle, ALL docs are removed from the index . > > > > >> > > > > > >> > *Expected Behaviour* > > > > >> > As I user I would like to see an error message, that will let me > > > > >> understand > > > > >> > the issue, not losing all my N indexed docs . > > > > >> > > > > > >> > *Reason* > > > > >> > Taking a look into the code, the problems seems to be in the : > > > > >> > > > > > >> > > > > > >> > > > > > > > > > > org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector#getDocumentVersions > > > > >> > where it tries to access each document singularly through Samba, > > and > > > > >> > removing them one by one if not reachable anymore. > > > > >> > > > > > >> > *Solution* > > > > >> > Before scanning each document, we have to be sure the connection > > is > > > > >> > working. > > > > >> > If not this is only armful. > > > > >> > > > > > >> > I will continue investigating, but I would like your opinion as > > well > > > > >> > > > > > >> > Cheers > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > -- > > > > >> > -------------------------- > > > > >> > > > > > >> > Benedetti Alessandro > > > > >> > Visiting card : http://about.me/alessandro_benedetti > > > > >> > > > > > >> > "Tyger, tyger burning bright > > > > >> > In the forests of the night, > > > > >> > What immortal hand or eye > > > > >> > Could frame thy fearful symmetry?" > > > > >> > > > > > >> > William Blake - Songs of Experience -1794 England > > > > >> > > > > > >> > > > > > > > > > > > > > > > > > > > > -- > > > > > -------------------------- > > > > > > > > > > Benedetti Alessandro > > > > > Visiting card : http://about.me/alessandro_benedetti > > > > > > > > > > "Tyger, tyger burning bright > > > > > In the forests of the night, > > > > > What immortal hand or eye > > > > > Could frame thy fearful symmetry?" > > > > > > > > > > William Blake - Songs of Experience -1794 England > > > > > > > > > > > > > > > > > > > > > -- > > > > -------------------------- > > > > > > > > Benedetti Alessandro > > > > Visiting card : http://about.me/alessandro_benedetti > > > > > > > > "Tyger, tyger burning bright > > > > In the forests of the night, > > > > What immortal hand or eye > > > > Could frame thy fearful symmetry?" > > > > > > > > William Blake - Songs of Experience -1794 England > > > > > > > > > > > > > > > -- > > -------------------------- > > > > Benedetti Alessandro > > Visiting card : http://about.me/alessandro_benedetti > > > > "Tyger, tyger burning bright > > In the forests of the night, > > What immortal hand or eye > > Could frame thy fearful symmetry?" > > > > William Blake - Songs of Experience -1794 England > > > -- -------------------------- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti "Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry?" William Blake - Songs of Experience -1794 England