[ 
https://issues.apache.org/jira/browse/AVRO-867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13072571#comment-13072571
 ] 

Joe Crobak commented on AVRO-867:
---------------------------------

bq. If DataFileReader were to incorporate this, then the core Avro pom might 
depend on Hadoop. Some have complained about this before, since Hadoop depends 
on Avro, creating a circular dependency. (In practice this is not an issue as 
long as both provide some backwards compatibility. Avro can build against an 
older, published version of Hadoop and vice-versa.)

Sorry -- when I mentioned DataFileReader I really meant DataFileReaderTool 
(same goes for DataFileGetSchemaTool).  My thought was to modify 
DataFileReaderTool as follows...

Rather than:

{code}
GenericDatumReader<Object> reader = new GenericDatumReader<Object>();
FileReader<Object> fileReader =
      DataFileReader.openReader(new File(args.get(0)), reader);
...
for (Object datum : fileReader) {
  ...
}
{code}

use the DataFileStream like:

{code}
GenericDatumReader<Object> reader = new GenericDatumReader<Object>();
DataFileStream<Object> streamReader =
      new DataFileStream(Util.fileOrStdin(args.get(0)), reader);
...
for (Object datum : streamReader) {
 ...
}
{code}

There are a few other Tools that could be simplified with the usage of 
fileOrStdin, too. How does this sound?

> Allow tools to read files via hadoop FileSystem class
> -----------------------------------------------------
>
>                 Key: AVRO-867
>                 URL: https://issues.apache.org/jira/browse/AVRO-867
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Joe Crobak
>            Assignee: Joe Crobak
>
> It would be great if I could use the various tools to read/parse files that 
> are in HDFS, S3, etc via the 
> [FileSystem|http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/fs/FileSystem.html]
>  api. We could retain backwards compatibility by assuming that unqualified 
> urls are "file://" but allow reading of files from fully qualified urls such 
> as hdfs://. The required apis are already part of the avro-tools uber jar to 
> support the TetherTool.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to