[ 
https://issues.apache.org/jira/browse/CRUNCH-204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13653087#comment-13653087
 ] 

Josh Wills commented on CRUNCH-204:
-----------------------------------

Hey Ben-- you're right, this is something we should cleanup, so that text files 
work as text files and Avro (and even SequenceFiles) work properly too.
                
> MemPipeline.write() is inconsistent with MemPipeline.read()
> -----------------------------------------------------------
>
>                 Key: CRUNCH-204
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-204
>             Project: Crunch
>          Issue Type: Bug
>    Affects Versions: 0.5.0
>            Reporter: Ben Roling
>
> MemPipeline.read(AvroFileSource) of a file written with 
> MemPipeline.write(collection, AvroFileTarget) fails with java.io.IOException: 
> Not a data file.
> I seems the way the file is written is inconsistent with the way it is read.  
> The appears to not actually be written out in Avro format.  It seems 
> MemPipeline.write() simply does a toString() on each of the elements in the 
> collection and spits that out to the target's path with each element 
> separated by a newline.
> Here is a simple test that demonstrates the issue:
> {code}
> final Pipeline memPipeline = MemPipeline.getInstance();
>         final String path = "persons";
>         final PCollection<Person> persons = 
> MemPipeline.collectionOf(Collections.singleton(new Person("John Doe")));
>         memPipeline.write(persons, new AvroFileTarget(path));
>         
>         // throws IOException!
>         memPipeline.read(new AvroFileSource<Person>(new Path(path), 
> Avros.records(Person.class)));
> {code}
> The Person class in the example is is based on this simple Avro schema:
> {code}
> @namespace("org.foo.model")
> protocol PersonProtocol {
>   record Person {
>     string name;
>   }
> }
> {code}
> This is pretty confusing behavior.  I ran into it trying to do some simple 
> testing and it took me longer than I'd like to admit to figure out what was 
> going on.  I imagine others will run into it and be similarly confused.
> I've left the priority as the default of Major although I suppose that point 
> could be argued.  Reset it as you like.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to