Hi Alessandro,

There are situations where the check() method does not succeed but you can
still crawl.  So I would not do it that way, since it fundamentally changes
the contract.

My proposal is to have processDocuments ABORT the job when it finds bad
credentials.  That's very fast and will not permit a job to run for a long
time.

The trick is to determine if there are bad credentials WITHOUT doing any
more work in the processDocuments pathway than we currently are.  An
exception will be thrown either way, but we need to figure out whether
there is any information in the exception that we can use to decide between
bad credentials and no access permissions.

You can help provide that by doing a simple experiment on your client's
hardware (or yours, if you have such hardware in house).  Change the
credential to an invalid one and see what the exception details are.  Then
change to valid credentials and try to crawl a directory that is not
visible to the credentialed user you supplied, and make a note of the
exception details in that case too.

Karl


On Tue, Mar 31, 2015 at 10:50 AM, Alessandro Benedetti <
benedetti.ale...@gmail.com> wrote:

> Currently we are checking each of the String[] oldVersions , trying to
> access it ...
> So in the scenario I described the current performances are quite bad...
> We would need to avoid at all the scan of the oldDocs if we know the
> provided credential are not valid anymore .
>
> Let me be extreme, but what about not allowing the job to start at all if
> the Repository Connector is currently broken ( i.e. the connection is not
> working, and we know that because of the check method) .
> In this way we avoid to destroy already existent indexes and we simply
> communicate a message in the job giving advice the job can not start
> because Repository connector is currently offline ( and showing the
> explanation) .
>
> Does this make sense ?
>
> 2015-03-31 15:30 GMT+01:00 Karl Wright <daddy...@gmail.com>:
>
> > Hi Alessandro,
> >
> > If you put a check in the processDocuments method, it will be called for
> > every group of documents.  That's fine, but if you structure it as a
> > separate call it would impact performance.  That is why I suggest just
> > doing a better job of interpreting the existing exceptions.
> >
> > Karl
> >
> >
> > On Tue, Mar 31, 2015 at 10:27 AM, Alessandro Benedetti <
> > benedetti.ale...@gmail.com> wrote:
> >
> > > As an addition, this should be quite simple, not proceeding with the
> > > processDocuments method, if the RepositoryConnector is not able to
> > connect(
> > > check method return not a proper message).
> > >
> > > Right ?
> > > Wondering where is the proper point to enter the action :)
> > >
> > > Cheers
> > >
> > > 2015-03-31 14:59 GMT+01:00 Alessandro Benedetti <
> > > benedetti.ale...@gmail.com>
> > > :
> > >
> > > > Yes Karl,
> > > >  I was thinking exactly that, to first check if the credentials are
> > > valid,
> > > > before scanning all the documents.
> > > > This because permissions per files depend on users/groups, but the
> > > current
> > > > scenario is not in-validating the user, but invalidating the access
> of
> > > that
> > > > user.
> > > >
> > > > An error must be thrown, but the docs not deleted ( not even
> scanned) .
> > > >
> > > > Furthermore, what will happen, in the case the server is down ?
> > > > Are we safe in that scenario ?
> > > >
> > > > Cheers
> > > >
> > > > 2015-03-31 14:42 GMT+01:00 Karl Wright <daddy...@gmail.com>:
> > > >
> > > >> This is actually pretty standard behavior across our connector
> family,
> > > and
> > > >> has been true since Day One.  The behavior comes from the basic
> broad
> > > >> requirement that the crawler should keep going and skip the document
> > > when
> > > >> the permissions do not allow it to be fetched.  With the Windows
> Share
> > > >> connector, it's sometimes the case (when DFS is used a lot) that
> whole
> > > >> subtrees of documents are not fetchable using the credentials
> > supplied.
> > > >> So
> > > >> it is not so easy to just check for valid credentials at the
> > beginning.
> > > >>
> > > >> For a solution, I'd be inclined to look for a way to figure out if
> the
> > > >> credentials are actually *invalid*, and abort the job if so.  This
> is
> > > >> distinct from the case where the credentials are valid but the
> > connector
> > > >> doesn't have permissions to read the document.  It will take some
> > > >> experimentation to see if we get back different exception text in
> the
> > > two
> > > >> situations.
> > > >>
> > > >> Karl
> > > >>
> > > >>
> > > >> On Tue, Mar 31, 2015 at 9:30 AM, Alessandro Benedetti <
> > > >> abenede...@apache.org
> > > >> > wrote:
> > > >>
> > > >> > Hi guys,
> > > >> > playing with the Windows Shares Connector in ManifoldCF 1.8 I
> > > >> encountered
> > > >> > this problem :
> > > >> >
> > > >> > *Scenario*
> > > >> > *1)* Indexing windows Shares server
> > > >> > *2)* Indexing successfully finished with N docs indexed
> > > >> > *3)* Offline ,while no indexing is happening, Shares server side,
> > the
> > > >> > Administrator password changes
> > > >> > *4) *Repository Connector is not able to connect anymore(of course
> > > >> because
> > > >> > the password has changed)
> > > >> > *5)* Next indexing cycle, ALL docs are removed from the index .
> > > >> >
> > > >> > *Expected Behaviour*
> > > >> > As I user I would like to see an error message, that will let me
> > > >> understand
> > > >> > the issue, not losing all my N indexed docs .
> > > >> >
> > > >> > *Reason*
> > > >> > Taking a look into the code, the problems seems to be in the :
> > > >> >
> > > >> >
> > > >>
> > >
> >
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector#getDocumentVersions
> > > >> > where it tries to access each document singularly through Samba,
> and
> > > >> > removing them one by one if not reachable anymore.
> > > >> >
> > > >> > *Solution*
> > > >> > Before scanning each document, we have to be sure the connection
> is
> > > >> > working.
> > > >> > If not this is only armful.
> > > >> >
> > > >> > I will continue investigating, but I would like your opinion as
> well
> > > >> >
> > > >> > Cheers
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> > --
> > > >> > --------------------------
> > > >> >
> > > >> > Benedetti Alessandro
> > > >> > Visiting card : http://about.me/alessandro_benedetti
> > > >> >
> > > >> > "Tyger, tyger burning bright
> > > >> > In the forests of the night,
> > > >> > What immortal hand or eye
> > > >> > Could frame thy fearful symmetry?"
> > > >> >
> > > >> > William Blake - Songs of Experience -1794 England
> > > >> >
> > > >>
> > > >
> > > >
> > > >
> > > > --
> > > > --------------------------
> > > >
> > > > Benedetti Alessandro
> > > > Visiting card : http://about.me/alessandro_benedetti
> > > >
> > > > "Tyger, tyger burning bright
> > > > In the forests of the night,
> > > > What immortal hand or eye
> > > > Could frame thy fearful symmetry?"
> > > >
> > > > William Blake - Songs of Experience -1794 England
> > > >
> > >
> > >
> > >
> > > --
> > > --------------------------
> > >
> > > Benedetti Alessandro
> > > Visiting card : http://about.me/alessandro_benedetti
> > >
> > > "Tyger, tyger burning bright
> > > In the forests of the night,
> > > What immortal hand or eye
> > > Could frame thy fearful symmetry?"
> > >
> > > William Blake - Songs of Experience -1794 England
> > >
> >
>
>
>
> --
> --------------------------
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>

Reply via email to