[
https://issues.apache.org/jira/browse/CONNECTORS-728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13693169#comment-13693169
]
Karl Wright commented on CONNECTORS-728:
----------------------------------------
Hi Osuka-san,
I've looked at the new code. It is good that you added the HDFSSession class.
There are still some issues, though - please see below.
First, this code:
{code}
try {
Path path = objt.getResponse();
if (session.getFileSystem().exists(path)) {
if (session.getFileSystem().getFileStatus(path).isDir()) {
long lastModified =
session.getFileSystem().getFileStatus(path).getModificationTime();
rval[i] = new Long(lastModified).toString();
} else {
long fileLength =
session.getFileSystem().getFileStatus(path).getLen();
if (activities.checkLengthIndexable(fileLength)) {
long lastModified =
session.getFileSystem().getFileStatus(path).getModificationTime();
StringBuilder sb = new StringBuilder();
if (filePathToUri) {
sb.append("+");
} else {
sb.append("-");
}
sb.append(new
Long(lastModified).toString()).append(":").append(new
Long(fileLength).toString());
rval[i] = sb.toString();
} else {
rval[i] = null;
}
}
} else {
rval[i] = null;
}
} catch (IOException e) {
objt.interrupt();
throw new ManifoldCFException(e);
}
}
{code}
The problem here is that methods that communicate with sockets can wait on
those sockets forever, and cannot be interrupted. So when the ManifoldCF
agents process tries to shut down, if there are worker threads waiting in this
way, they cannot be stopped.
We avoid this problem usually by having a "background" thread do the actual
work of using the socket. Elsewhere in this connector, for instance, you have
a GetSeedsThread, which seems to be correct. But where you have a
GetObjectThread, you only call this code:
{code}
public Path getObject(String id) {
return new Path(id);
}
{code}
... which doesn't, I think, do any socket work at all! Instead the socket work
is happening here:
{code}
session.getFileSystem().exists(path)
session.getFileSystem().getFileStatus(path).isDir()
session.getFileSystem().getFileStatus(path).getModificationTime()
etc.
{code}
So those methods must happen in a background thread instead. What I suggest
you consider is to create a new class, maybe called FileMetadata, which
contains all these values as members. Then you create a FileMetadata class
instance in your session.getObject() method instead of just creating the Path,
and fill in the FileMetadata class instance with all the values you will need
right away.
Second, I see you have a BackgroundStreamThread, which looks like it might work
if run in background to transfer data from HDFS, but it is not used anywhere.
It needs to be instantiated and run to work.
Hope this helps.
> Add HDFS connector.
> -------------------
>
> Key: CONNECTORS-728
> URL: https://issues.apache.org/jira/browse/CONNECTORS-728
> Project: ManifoldCF
> Issue Type: Improvement
> Affects Versions: ManifoldCF 1.3
> Reporter: Minoru Osuka
> Assignee: Minoru Osuka
> Priority: Minor
>
> I would like to suggest you the HDFS Connector.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira