2015-04-02 15:58 GMT+01:00 Karl Wright <daddy...@gmail.com>: > Hi Alessandro, > > Yes, you interpreted my reply correctly. > > I think we therefore have to perform any checking operations on the actual > file being accessed. This is actually pretty easy to do without > sacrificing performance. All you need to do is the following: > > try { > ... do the file access operation ... > } catch (SmbException e) { > ... figure out from the exception whether to throw a ManifoldCFException > or a ServiceInterruption ... > ... If the exception does not include enough to distinguish between bad > credentials and insufficient privs, then do a check RIGHT HERE for bad > credentials ... > } > > What do you think? The new code would only ever be called if the document > cannot be read. >
I think we can proceed like you said, I am investigating right now the details returned for the exception ( to understand if there is any difference between wrong credentials or access denied) In the case we find the "wrong credential" we have to throw the exception and stop the iteration ( this will happen the very first time assuming none is playing server side) . In this way we save the time of checking all the files ( in the case of wrong credentials no one will be accessible) . Another way can be to do this credential check at the beginning and stop only if we have wrong credential ( leaving the permission check file by file) . Quite a confused scenario, but we can sort this out with little changes :) > > Karl > > > On Thu, Apr 2, 2015 at 10:42 AM, Alessandro Benedetti < > benedetti.ale...@gmail.com> wrote: > > > OkI am currently working on that, and I will work on that next tuesday as > > well . > > But what about point 2 : > > " (2) the check itself is > > specific to the ROOT of the tree, which the user may not have access to." > > > > I think I got your problem, you mean that a possible scenario can happen > > when you configure the repository connector with a user that is not able > > to access the root but is able to access the directories we want to > crawl. > > In such a case the repository connector will appear to be not able to > > connect, while the crawling will be still possible if you configure the > > accessible directories in the job. > > If this is correct , the situation is more complicated ... > > > > Cheers > > > > > > 2015-03-31 16:44 GMT+01:00 Karl Wright <daddy...@gmail.com>: > > > > > Hi Alessandro, > > > > > > Your code snippet has two problems: (1) it doesn't distinguish between > > > service interruptions and bad credentials, > > > > > > Should not be the difference between the IOException and the Smb one ? > > > > > > > and (2) the check itself is > > > specific to the ROOT of the tree, which the user may not have access > to. > > > > > > > > > > > > In check() we can get away with this but if you wire up the check() > logic > > > into the crawl processing it will break some people. > > > > > > The first problem, (1), is exactly what we need to figure out anyway. > > > > > > Karl > > > > > > > > > On Tue, Mar 31, 2015 at 11:30 AM, Alessandro Benedetti < > > > benedetti.ale...@gmail.com> wrote: > > > > > > > Hi karl comments follow : > > > > > > > > 2015-03-31 16:18 GMT+01:00 Karl Wright <daddy...@gmail.com>: > > > > > > > > > Hi Alessandro, > > > > > > > > > > There are situations where the check() method does not succeed but > > you > > > > can > > > > > still crawl. So I would not do it that way, since it fundamentally > > > > changes > > > > > the contract. > > > > > > > > > > > > > Am I wrong or we should assume the "check()" method to work as it's > > built > > > > for. > > > > I mean if in some case, this method is wrongly implemented , this can > > not > > > > break another assumption. > > > > > > > > > > > > > > My proposal is to have processDocuments ABORT the job when it finds > > bad > > > > > credentials. That's very fast and will not permit a job to run > for a > > > > long > > > > > time. > > > > > > > > > > The trick is to determine if there are bad credentials WITHOUT > doing > > > any > > > > > more work in the processDocuments pathway than we currently are. > An > > > > > exception will be thrown either way, but we need to figure out > > whether > > > > > there is any information in the exception that we can use to decide > > > > between > > > > > bad credentials and no access permissions. > > > > > > > > > > You can help provide that by doing a simple experiment on your > > client's > > > > > hardware (or yours, if you have such hardware in house). Change > the > > > > > credential to an invalid one and see what the exception details > are. > > > > Then > > > > > change to valid credentials and try to crawl a directory that is > not > > > > > visible to the credentialed user you supplied, and make a note of > the > > > > > exception details in that case too. > > > > > > > > > > > > > I was thinking to slightly modifying the getSession() method adding > the > > > > file exist check , something like this : > > > > > > > > ... > > > > > > > > try > > > > { > > > > // use NtlmPasswordAuthentication so that we can reuse credential > > > > for DFS support > > > > pa = new NtlmPasswordAuthentication( domain, username, password > ); > > > > SmbFile smbconnection = new SmbFile( "smb://" + server + "/", pa > ); > > > > smbconnectionPath = getFileCanonicalPath( smbconnection ); > > > > smbconnection.exists(); > > > > } > > > > catch ( MalformedURLException e ) > > > > { > > > > Logging.connectors.error( > > > > "Unable to access SMB/CIFS share: " + "smb://" + ( ( domain > == > > > > null ) ? "" : domain ) + ";" > > > > + username + ":<password>@" + server + "/\n" + e ); > > > > throw new ManifoldCFException( "Unable to access SMB/CIFS share: > " > > > > + server, e, > > > > > > > > ManifoldCFException.REPOSITORY_CONNECTION_ERROR ); > > > > } catch (SmbException e) { > > > > Logging.connectors.error( > > > > "Unable to access SMB/CIFS share: Credential not valid - > " > > > > + "smb://" + ((domain == null) ? "" : domain) + ";" > > > > + username + ":<password>@" + server + "/\n" + > e); > > > > throw new ManifoldCFException( "Unable to access SMB/CIFS share: > > > > Credential not valid - " + server, e, > > > > ManifoldCFException.REPOSITORY_CONNECTION_ERROR ); > > > > } > > > > > > > > Catching the smbException should make the trick. > > > > Anyway I will go more in details. > > > > > > > > Cheers > > > > > > > > > > > > > Karl > > > > > > > > > > > > > > > On Tue, Mar 31, 2015 at 10:50 AM, Alessandro Benedetti < > > > > > benedetti.ale...@gmail.com> wrote: > > > > > > > > > > > Currently we are checking each of the String[] oldVersions , > trying > > > to > > > > > > access it ... > > > > > > So in the scenario I described the current performances are quite > > > > bad... > > > > > > We would need to avoid at all the scan of the oldDocs if we know > > the > > > > > > provided credential are not valid anymore . > > > > > > > > > > > > Let me be extreme, but what about not allowing the job to start > at > > > all > > > > if > > > > > > the Repository Connector is currently broken ( i.e. the > connection > > is > > > > not > > > > > > working, and we know that because of the check method) . > > > > > > In this way we avoid to destroy already existent indexes and we > > > simply > > > > > > communicate a message in the job giving advice the job can not > > start > > > > > > because Repository connector is currently offline ( and showing > the > > > > > > explanation) . > > > > > > > > > > > > Does this make sense ? > > > > > > > > > > > > 2015-03-31 15:30 GMT+01:00 Karl Wright <daddy...@gmail.com>: > > > > > > > > > > > > > Hi Alessandro, > > > > > > > > > > > > > > If you put a check in the processDocuments method, it will be > > > called > > > > > for > > > > > > > every group of documents. That's fine, but if you structure it > > as > > > a > > > > > > > separate call it would impact performance. That is why I > suggest > > > > just > > > > > > > doing a better job of interpreting the existing exceptions. > > > > > > > > > > > > > > Karl > > > > > > > > > > > > > > > > > > > > > On Tue, Mar 31, 2015 at 10:27 AM, Alessandro Benedetti < > > > > > > > benedetti.ale...@gmail.com> wrote: > > > > > > > > > > > > > > > As an addition, this should be quite simple, not proceeding > > with > > > > the > > > > > > > > processDocuments method, if the RepositoryConnector is not > able > > > to > > > > > > > connect( > > > > > > > > check method return not a proper message). > > > > > > > > > > > > > > > > Right ? > > > > > > > > Wondering where is the proper point to enter the action :) > > > > > > > > > > > > > > > > Cheers > > > > > > > > > > > > > > > > 2015-03-31 14:59 GMT+01:00 Alessandro Benedetti < > > > > > > > > benedetti.ale...@gmail.com> > > > > > > > > : > > > > > > > > > > > > > > > > > Yes Karl, > > > > > > > > > I was thinking exactly that, to first check if the > > credentials > > > > are > > > > > > > > valid, > > > > > > > > > before scanning all the documents. > > > > > > > > > This because permissions per files depend on users/groups, > > but > > > > the > > > > > > > > current > > > > > > > > > scenario is not in-validating the user, but invalidating > the > > > > access > > > > > > of > > > > > > > > that > > > > > > > > > user. > > > > > > > > > > > > > > > > > > An error must be thrown, but the docs not deleted ( not > even > > > > > > scanned) . > > > > > > > > > > > > > > > > > > Furthermore, what will happen, in the case the server is > > down ? > > > > > > > > > Are we safe in that scenario ? > > > > > > > > > > > > > > > > > > Cheers > > > > > > > > > > > > > > > > > > 2015-03-31 14:42 GMT+01:00 Karl Wright <daddy...@gmail.com > >: > > > > > > > > > > > > > > > > > >> This is actually pretty standard behavior across our > > connector > > > > > > family, > > > > > > > > and > > > > > > > > >> has been true since Day One. The behavior comes from the > > > basic > > > > > > broad > > > > > > > > >> requirement that the crawler should keep going and skip > the > > > > > document > > > > > > > > when > > > > > > > > >> the permissions do not allow it to be fetched. With the > > > Windows > > > > > > Share > > > > > > > > >> connector, it's sometimes the case (when DFS is used a > lot) > > > that > > > > > > whole > > > > > > > > >> subtrees of documents are not fetchable using the > > credentials > > > > > > > supplied. > > > > > > > > >> So > > > > > > > > >> it is not so easy to just check for valid credentials at > the > > > > > > > beginning. > > > > > > > > >> > > > > > > > > >> For a solution, I'd be inclined to look for a way to > figure > > > out > > > > if > > > > > > the > > > > > > > > >> credentials are actually *invalid*, and abort the job if > so. > > > > This > > > > > > is > > > > > > > > >> distinct from the case where the credentials are valid but > > the > > > > > > > connector > > > > > > > > >> doesn't have permissions to read the document. It will > take > > > > some > > > > > > > > >> experimentation to see if we get back different exception > > text > > > > in > > > > > > the > > > > > > > > two > > > > > > > > >> situations. > > > > > > > > >> > > > > > > > > >> Karl > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> On Tue, Mar 31, 2015 at 9:30 AM, Alessandro Benedetti < > > > > > > > > >> abenede...@apache.org > > > > > > > > >> > wrote: > > > > > > > > >> > > > > > > > > >> > Hi guys, > > > > > > > > >> > playing with the Windows Shares Connector in ManifoldCF > > 1.8 > > > I > > > > > > > > >> encountered > > > > > > > > >> > this problem : > > > > > > > > >> > > > > > > > > > >> > *Scenario* > > > > > > > > >> > *1)* Indexing windows Shares server > > > > > > > > >> > *2)* Indexing successfully finished with N docs indexed > > > > > > > > >> > *3)* Offline ,while no indexing is happening, Shares > > server > > > > > side, > > > > > > > the > > > > > > > > >> > Administrator password changes > > > > > > > > >> > *4) *Repository Connector is not able to connect > > anymore(of > > > > > course > > > > > > > > >> because > > > > > > > > >> > the password has changed) > > > > > > > > >> > *5)* Next indexing cycle, ALL docs are removed from the > > > index > > > > . > > > > > > > > >> > > > > > > > > > >> > *Expected Behaviour* > > > > > > > > >> > As I user I would like to see an error message, that > will > > > let > > > > me > > > > > > > > >> understand > > > > > > > > >> > the issue, not losing all my N indexed docs . > > > > > > > > >> > > > > > > > > > >> > *Reason* > > > > > > > > >> > Taking a look into the code, the problems seems to be in > > > the : > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector#getDocumentVersions > > > > > > > > >> > where it tries to access each document singularly > through > > > > Samba, > > > > > > and > > > > > > > > >> > removing them one by one if not reachable anymore. > > > > > > > > >> > > > > > > > > > >> > *Solution* > > > > > > > > >> > Before scanning each document, we have to be sure the > > > > connection > > > > > > is > > > > > > > > >> > working. > > > > > > > > >> > If not this is only armful. > > > > > > > > >> > > > > > > > > > >> > I will continue investigating, but I would like your > > opinion > > > > as > > > > > > well > > > > > > > > >> > > > > > > > > > >> > Cheers > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > -- > > > > > > > > >> > -------------------------- > > > > > > > > >> > > > > > > > > > >> > Benedetti Alessandro > > > > > > > > >> > Visiting card : http://about.me/alessandro_benedetti > > > > > > > > >> > > > > > > > > > >> > "Tyger, tyger burning bright > > > > > > > > >> > In the forests of the night, > > > > > > > > >> > What immortal hand or eye > > > > > > > > >> > Could frame thy fearful symmetry?" > > > > > > > > >> > > > > > > > > > >> > William Blake - Songs of Experience -1794 England > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > -------------------------- > > > > > > > > > > > > > > > > > > Benedetti Alessandro > > > > > > > > > Visiting card : http://about.me/alessandro_benedetti > > > > > > > > > > > > > > > > > > "Tyger, tyger burning bright > > > > > > > > > In the forests of the night, > > > > > > > > > What immortal hand or eye > > > > > > > > > Could frame thy fearful symmetry?" > > > > > > > > > > > > > > > > > > William Blake - Songs of Experience -1794 England > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > -------------------------- > > > > > > > > > > > > > > > > Benedetti Alessandro > > > > > > > > Visiting card : http://about.me/alessandro_benedetti > > > > > > > > > > > > > > > > "Tyger, tyger burning bright > > > > > > > > In the forests of the night, > > > > > > > > What immortal hand or eye > > > > > > > > Could frame thy fearful symmetry?" > > > > > > > > > > > > > > > > William Blake - Songs of Experience -1794 England > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > -------------------------- > > > > > > > > > > > > Benedetti Alessandro > > > > > > Visiting card : http://about.me/alessandro_benedetti > > > > > > > > > > > > "Tyger, tyger burning bright > > > > > > In the forests of the night, > > > > > > What immortal hand or eye > > > > > > Could frame thy fearful symmetry?" > > > > > > > > > > > > William Blake - Songs of Experience -1794 England > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > -------------------------- > > > > > > > > Benedetti Alessandro > > > > Visiting card : http://about.me/alessandro_benedetti > > > > > > > > "Tyger, tyger burning bright > > > > In the forests of the night, > > > > What immortal hand or eye > > > > Could frame thy fearful symmetry?" > > > > > > > > William Blake - Songs of Experience -1794 England > > > > > > > > > > > > > > > -- > > -------------------------- > > > > Benedetti Alessandro > > Visiting card : http://about.me/alessandro_benedetti > > > > "Tyger, tyger burning bright > > In the forests of the night, > > What immortal hand or eye > > Could frame thy fearful symmetry?" > > > > William Blake - Songs of Experience -1794 England > > > -- -------------------------- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti "Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry?" William Blake - Songs of Experience -1794 England