[ 
https://issues.apache.org/jira/browse/CONNECTORS-728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13693169#comment-13693169
 ] 

Karl Wright commented on CONNECTORS-728:
----------------------------------------

Hi Osuka-san,

I've looked at the new code.  It is good that you added the HDFSSession class.  
There are still some issues, though - please see below.

First, this code:

{code}
      try {
        Path path = objt.getResponse();
        if (session.getFileSystem().exists(path)) {
          if (session.getFileSystem().getFileStatus(path).isDir()) {
            long lastModified = 
session.getFileSystem().getFileStatus(path).getModificationTime();
            rval[i] = new Long(lastModified).toString();
          } else {
            long fileLength = 
session.getFileSystem().getFileStatus(path).getLen();
            if (activities.checkLengthIndexable(fileLength)) {
              long lastModified = 
session.getFileSystem().getFileStatus(path).getModificationTime();
              StringBuilder sb = new StringBuilder();
              if (filePathToUri) {
                sb.append("+");
              } else {
                sb.append("-");
              }
              sb.append(new 
Long(lastModified).toString()).append(":").append(new 
Long(fileLength).toString());
              rval[i] = sb.toString();
            } else {
              rval[i] = null;
            }
          }
        } else {
          rval[i] = null;
        }
      } catch (IOException e) {
        objt.interrupt();
        throw new ManifoldCFException(e);
      }
    }
{code}

The problem here is that methods that communicate with sockets can wait on 
those sockets forever, and cannot be interrupted.  So when the ManifoldCF 
agents process tries to shut down, if there are worker threads waiting in this 
way, they cannot be stopped.

We avoid this problem usually by having a "background" thread do the actual 
work of using the socket.  Elsewhere in this connector, for instance, you have 
a GetSeedsThread, which seems to be correct.  But where you have a 
GetObjectThread, you only call this code:

{code}
  public Path getObject(String id) {
    return new Path(id);
  }
{code}

... which doesn't, I think, do any socket work at all!  Instead the socket work 
is happening here:

{code}
session.getFileSystem().exists(path)
session.getFileSystem().getFileStatus(path).isDir()
session.getFileSystem().getFileStatus(path).getModificationTime()
etc.
{code}

So those methods must happen in a background thread instead.  What I suggest 
you consider is to create a new class, maybe called FileMetadata, which 
contains all these values as members.  Then you create a FileMetadata class 
instance in your session.getObject() method instead of just creating the Path, 
and fill in the FileMetadata class instance with all the values you will need 
right away.

Second, I see you have a BackgroundStreamThread, which looks like it might work 
if run in background to transfer data from HDFS, but it is not used anywhere.  
It needs to be instantiated and run to work.

Hope this helps.  

                
> Add HDFS connector.
> -------------------
>
>                 Key: CONNECTORS-728
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-728
>             Project: ManifoldCF
>          Issue Type: Improvement
>    Affects Versions: ManifoldCF 1.3
>            Reporter: Minoru Osuka
>            Assignee: Minoru Osuka
>            Priority: Minor
>
> I would like to suggest you the HDFS Connector.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to