Hi karl comments follow :

2015-03-31 16:18 GMT+01:00 Karl Wright <daddy...@gmail.com>:

> Hi Alessandro,
>
> There are situations where the check() method does not succeed but you can
> still crawl.  So I would not do it that way, since it fundamentally changes
> the contract.
>

Am I wrong or we should assume the "check()" method to work as it's built
for.
I mean if in some case, this method is wrongly implemented , this can not
break another assumption.

>
> My proposal is to have processDocuments ABORT the job when it finds bad
> credentials.  That's very fast and will not permit a job to run for a long
> time.
>
> The trick is to determine if there are bad credentials WITHOUT doing any
> more work in the processDocuments pathway than we currently are.  An
> exception will be thrown either way, but we need to figure out whether
> there is any information in the exception that we can use to decide between
> bad credentials and no access permissions.
>
> You can help provide that by doing a simple experiment on your client's
> hardware (or yours, if you have such hardware in house).  Change the
> credential to an invalid one and see what the exception details are.  Then
> change to valid credentials and try to crawl a directory that is not
> visible to the credentialed user you supplied, and make a note of the
> exception details in that case too.
>

I was thinking to slightly modifying the getSession() method adding the
file exist check , something like this :

...

try
{
    // use NtlmPasswordAuthentication so that we can reuse credential
for DFS support
    pa = new NtlmPasswordAuthentication( domain, username, password );
    SmbFile smbconnection = new SmbFile( "smb://" + server + "/", pa );
    smbconnectionPath = getFileCanonicalPath( smbconnection );
    smbconnection.exists();
}
catch ( MalformedURLException e )
{
    Logging.connectors.error(
        "Unable to access SMB/CIFS share: " + "smb://" + ( ( domain ==
null ) ? "" : domain ) + ";"
            + username + ":<password>@" + server + "/\n" + e );
    throw new ManifoldCFException( "Unable to access SMB/CIFS share: "
+ server, e,

ManifoldCFException.REPOSITORY_CONNECTION_ERROR );
} catch (SmbException e) {
    Logging.connectors.error(
            "Unable to access SMB/CIFS share: Credential not valid - "
+ "smb://" + ((domain == null) ? "" : domain) + ";"
                    + username + ":<password>@" + server + "/\n" + e);
    throw new ManifoldCFException( "Unable to access SMB/CIFS share:
Credential not valid - " + server, e,
            ManifoldCFException.REPOSITORY_CONNECTION_ERROR );
}

Catching the smbException should make the trick.
Anyway I will go more in details.

Cheers


> Karl
>
>
> On Tue, Mar 31, 2015 at 10:50 AM, Alessandro Benedetti <
> benedetti.ale...@gmail.com> wrote:
>
> > Currently we are checking each of the String[] oldVersions , trying to
> > access it ...
> > So in the scenario I described the current performances are quite bad...
> > We would need to avoid at all the scan of the oldDocs if we know the
> > provided credential are not valid anymore .
> >
> > Let me be extreme, but what about not allowing the job to start at all if
> > the Repository Connector is currently broken ( i.e. the connection is not
> > working, and we know that because of the check method) .
> > In this way we avoid to destroy already existent indexes and we simply
> > communicate a message in the job giving advice the job can not start
> > because Repository connector is currently offline ( and showing the
> > explanation) .
> >
> > Does this make sense ?
> >
> > 2015-03-31 15:30 GMT+01:00 Karl Wright <daddy...@gmail.com>:
> >
> > > Hi Alessandro,
> > >
> > > If you put a check in the processDocuments method, it will be called
> for
> > > every group of documents.  That's fine, but if you structure it as a
> > > separate call it would impact performance.  That is why I suggest just
> > > doing a better job of interpreting the existing exceptions.
> > >
> > > Karl
> > >
> > >
> > > On Tue, Mar 31, 2015 at 10:27 AM, Alessandro Benedetti <
> > > benedetti.ale...@gmail.com> wrote:
> > >
> > > > As an addition, this should be quite simple, not proceeding with the
> > > > processDocuments method, if the RepositoryConnector is not able to
> > > connect(
> > > > check method return not a proper message).
> > > >
> > > > Right ?
> > > > Wondering where is the proper point to enter the action :)
> > > >
> > > > Cheers
> > > >
> > > > 2015-03-31 14:59 GMT+01:00 Alessandro Benedetti <
> > > > benedetti.ale...@gmail.com>
> > > > :
> > > >
> > > > > Yes Karl,
> > > > >  I was thinking exactly that, to first check if the credentials are
> > > > valid,
> > > > > before scanning all the documents.
> > > > > This because permissions per files depend on users/groups, but the
> > > > current
> > > > > scenario is not in-validating the user, but invalidating the access
> > of
> > > > that
> > > > > user.
> > > > >
> > > > > An error must be thrown, but the docs not deleted ( not even
> > scanned) .
> > > > >
> > > > > Furthermore, what will happen, in the case the server is down ?
> > > > > Are we safe in that scenario ?
> > > > >
> > > > > Cheers
> > > > >
> > > > > 2015-03-31 14:42 GMT+01:00 Karl Wright <daddy...@gmail.com>:
> > > > >
> > > > >> This is actually pretty standard behavior across our connector
> > family,
> > > > and
> > > > >> has been true since Day One.  The behavior comes from the basic
> > broad
> > > > >> requirement that the crawler should keep going and skip the
> document
> > > > when
> > > > >> the permissions do not allow it to be fetched.  With the Windows
> > Share
> > > > >> connector, it's sometimes the case (when DFS is used a lot) that
> > whole
> > > > >> subtrees of documents are not fetchable using the credentials
> > > supplied.
> > > > >> So
> > > > >> it is not so easy to just check for valid credentials at the
> > > beginning.
> > > > >>
> > > > >> For a solution, I'd be inclined to look for a way to figure out if
> > the
> > > > >> credentials are actually *invalid*, and abort the job if so.  This
> > is
> > > > >> distinct from the case where the credentials are valid but the
> > > connector
> > > > >> doesn't have permissions to read the document.  It will take some
> > > > >> experimentation to see if we get back different exception text in
> > the
> > > > two
> > > > >> situations.
> > > > >>
> > > > >> Karl
> > > > >>
> > > > >>
> > > > >> On Tue, Mar 31, 2015 at 9:30 AM, Alessandro Benedetti <
> > > > >> abenede...@apache.org
> > > > >> > wrote:
> > > > >>
> > > > >> > Hi guys,
> > > > >> > playing with the Windows Shares Connector in ManifoldCF 1.8 I
> > > > >> encountered
> > > > >> > this problem :
> > > > >> >
> > > > >> > *Scenario*
> > > > >> > *1)* Indexing windows Shares server
> > > > >> > *2)* Indexing successfully finished with N docs indexed
> > > > >> > *3)* Offline ,while no indexing is happening, Shares server
> side,
> > > the
> > > > >> > Administrator password changes
> > > > >> > *4) *Repository Connector is not able to connect anymore(of
> course
> > > > >> because
> > > > >> > the password has changed)
> > > > >> > *5)* Next indexing cycle, ALL docs are removed from the index .
> > > > >> >
> > > > >> > *Expected Behaviour*
> > > > >> > As I user I would like to see an error message, that will let me
> > > > >> understand
> > > > >> > the issue, not losing all my N indexed docs .
> > > > >> >
> > > > >> > *Reason*
> > > > >> > Taking a look into the code, the problems seems to be in the :
> > > > >> >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector#getDocumentVersions
> > > > >> > where it tries to access each document singularly through Samba,
> > and
> > > > >> > removing them one by one if not reachable anymore.
> > > > >> >
> > > > >> > *Solution*
> > > > >> > Before scanning each document, we have to be sure the connection
> > is
> > > > >> > working.
> > > > >> > If not this is only armful.
> > > > >> >
> > > > >> > I will continue investigating, but I would like your opinion as
> > well
> > > > >> >
> > > > >> > Cheers
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> > --
> > > > >> > --------------------------
> > > > >> >
> > > > >> > Benedetti Alessandro
> > > > >> > Visiting card : http://about.me/alessandro_benedetti
> > > > >> >
> > > > >> > "Tyger, tyger burning bright
> > > > >> > In the forests of the night,
> > > > >> > What immortal hand or eye
> > > > >> > Could frame thy fearful symmetry?"
> > > > >> >
> > > > >> > William Blake - Songs of Experience -1794 England
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > --------------------------
> > > > >
> > > > > Benedetti Alessandro
> > > > > Visiting card : http://about.me/alessandro_benedetti
> > > > >
> > > > > "Tyger, tyger burning bright
> > > > > In the forests of the night,
> > > > > What immortal hand or eye
> > > > > Could frame thy fearful symmetry?"
> > > > >
> > > > > William Blake - Songs of Experience -1794 England
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > --------------------------
> > > >
> > > > Benedetti Alessandro
> > > > Visiting card : http://about.me/alessandro_benedetti
> > > >
> > > > "Tyger, tyger burning bright
> > > > In the forests of the night,
> > > > What immortal hand or eye
> > > > Could frame thy fearful symmetry?"
> > > >
> > > > William Blake - Songs of Experience -1794 England
> > > >
> > >
> >
> >
> >
> > --
> > --------------------------
> >
> > Benedetti Alessandro
> > Visiting card : http://about.me/alessandro_benedetti
> >
> > "Tyger, tyger burning bright
> > In the forests of the night,
> > What immortal hand or eye
> > Could frame thy fearful symmetry?"
> >
> > William Blake - Songs of Experience -1794 England
> >
>



-- 
--------------------------

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Reply via email to