[
https://issues.apache.org/jira/browse/CONNECTORS-728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13691869#comment-13691869
]
Karl Wright commented on CONNECTORS-728:
----------------------------------------
I looked briefly at the repository connector - not a complete review, but here
are my impressions.
(1) Error handling. It needs to be much more careful about dealing properly
with exceptions. For instance:
{code}
/*
* get connection to HDFS
*/
try {
fileSystem = FileSystem.get(new URI(nameNode), config, user);
} catch (URISyntaxException e) {
} catch (IOException e) {
} catch (InterruptedException e) {
}
}
{code}
In particular, you probably want to create a getSession() method, which you
call only when you need the connection initialized, which can throw
ManifoldCFException and ServiceInterruption. Otherwise if you do it in
connect(), you lose critical exception info. See the Google Drive connector
for a model of how to do this.
(2) You are using a deprecated method for doing seeding:
{code}
@Override
public IDocumentIdentifierStream getDocumentIdentifiers(DocumentSpecification
spec, long startTime, long endTime)
throws ManifoldCFException
{
return new IdentifierStream(spec);
}
{code}
Instead, look up addSeedDocuments().
I'll have more comments later.
> Add HDFS connector.
> -------------------
>
> Key: CONNECTORS-728
> URL: https://issues.apache.org/jira/browse/CONNECTORS-728
> Project: ManifoldCF
> Issue Type: Improvement
> Affects Versions: ManifoldCF 1.3
> Reporter: Minoru Osuka
> Assignee: Minoru Osuka
> Priority: Minor
>
> I would like to suggest you the HDFS Connector.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira