[
https://issues.apache.org/jira/browse/CRUNCH-204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Christian Tzolov updated CRUNCH-204:
------------------------------------
Attachment: CRUNCH-204.patch
Here is a patch and integration tes.
This patch should support the specific, generic and reflection Avro types.
> MemPipeline.write() is inconsistent with MemPipeline.read()
> -----------------------------------------------------------
>
> Key: CRUNCH-204
> URL: https://issues.apache.org/jira/browse/CRUNCH-204
> Project: Crunch
> Issue Type: Bug
> Affects Versions: 0.5.0
> Reporter: Ben Roling
> Assignee: Josh Wills
> Attachments: CRUNCH-204.patch
>
>
> MemPipeline.read(AvroFileSource) of a file written with
> MemPipeline.write(collection, AvroFileTarget) fails with java.io.IOException:
> Not a data file.
> I seems the way the file is written is inconsistent with the way it is read.
> The appears to not actually be written out in Avro format. It seems
> MemPipeline.write() simply does a toString() on each of the elements in the
> collection and spits that out to the target's path with each element
> separated by a newline.
> Here is a simple test that demonstrates the issue:
> {code}
> final Pipeline memPipeline = MemPipeline.getInstance();
> final String path = "persons";
> final PCollection<Person> persons =
> MemPipeline.collectionOf(Collections.singleton(new Person("John Doe")));
> memPipeline.write(persons, new AvroFileTarget(path));
>
> // throws IOException!
> memPipeline.read(new AvroFileSource<Person>(new Path(path),
> Avros.records(Person.class)));
> {code}
> The Person class in the example is is based on this simple Avro schema:
> {code}
> @namespace("org.foo.model")
> protocol PersonProtocol {
> record Person {
> string name;
> }
> }
> {code}
> This is pretty confusing behavior. I ran into it trying to do some simple
> testing and it took me longer than I'd like to admit to figure out what was
> going on. I imagine others will run into it and be similarly confused.
> I've left the priority as the default of Major although I suppose that point
> could be argued. Reset it as you like.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira