Harsh J created AVRO-1244:
-----------------------------

             Summary: Provide a SeekableInput implementation for FileSystem 
retrieved output streams
                 Key: AVRO-1244
                 URL: https://issues.apache.org/jira/browse/AVRO-1244
             Project: Avro
          Issue Type: Improvement
          Components: java
            Reporter: Harsh J
            Priority: Minor


To use the DFW#appendTo API, one needs to pass a SeekableInput interface 
object. Avro provides a usable utility for files that can be represented by a 
File object, but in the Hadoop land, HDFS and other FSes can't be represented 
via a File object and need a longer route to implement this interface.

We can add a simple HadoopSeekableFSInput or so that can take Hadoop provided 
objects and wrap it into a SeekableInput interface ready for passing to Avro.

I propose something of the following type:

{code}
public static class HadoopSeekableFSInput implements SeekableInput {
    FSDataInputStream in;
    long length;
 
    public SeekableFSInput(FSDataInputStream in, long length) {
      this.in = in;
      this.length = length;
    }
 
    public void close() throws IOException {
      in.close();
    }
 
    public void seek(long p) throws IOException {
      in.seek(p);
    }
 
    public long tell() throws IOException {
      return in.getPos();
    }
 
    public long length() throws IOException {
      return length;
    }
 
    public int read(byte[] b, int off, int len) throws IOException {
      return in.read(b, off, len);
    }
  }
{code}

The above can be constructed by users via a simple call such as {{new 
HadoopSeekableFSInput(fs.open(filePath), fs.getFileStatus(filePath).getLen())}}.

Ideally this class should belong in the avro core module but that strictly does 
not depend on Hadoop-Common today, and hence somewhere else may be more 
suitable.

This lets users write Avro-append code such as 
https://gist.github.com/QwertyManiac/4724582 more easily.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to