[
https://issues.apache.org/jira/browse/AVRO-867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13072571#comment-13072571
]
Joe Crobak commented on AVRO-867:
---------------------------------
bq. If DataFileReader were to incorporate this, then the core Avro pom might
depend on Hadoop. Some have complained about this before, since Hadoop depends
on Avro, creating a circular dependency. (In practice this is not an issue as
long as both provide some backwards compatibility. Avro can build against an
older, published version of Hadoop and vice-versa.)
Sorry -- when I mentioned DataFileReader I really meant DataFileReaderTool
(same goes for DataFileGetSchemaTool). My thought was to modify
DataFileReaderTool as follows...
Rather than:
{code}
GenericDatumReader<Object> reader = new GenericDatumReader<Object>();
FileReader<Object> fileReader =
DataFileReader.openReader(new File(args.get(0)), reader);
...
for (Object datum : fileReader) {
...
}
{code}
use the DataFileStream like:
{code}
GenericDatumReader<Object> reader = new GenericDatumReader<Object>();
DataFileStream<Object> streamReader =
new DataFileStream(Util.fileOrStdin(args.get(0)), reader);
...
for (Object datum : streamReader) {
...
}
{code}
There are a few other Tools that could be simplified with the usage of
fileOrStdin, too. How does this sound?
> Allow tools to read files via hadoop FileSystem class
> -----------------------------------------------------
>
> Key: AVRO-867
> URL: https://issues.apache.org/jira/browse/AVRO-867
> Project: Avro
> Issue Type: New Feature
> Components: java
> Reporter: Joe Crobak
> Assignee: Joe Crobak
>
> It would be great if I could use the various tools to read/parse files that
> are in HDFS, S3, etc via the
> [FileSystem|http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/fs/FileSystem.html]
> api. We could retain backwards compatibility by assuming that unqualified
> urls are "file://" but allow reading of files from fully qualified urls such
> as hdfs://. The required apis are already part of the avro-tools uber jar to
> support the TetherTool.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira