[jira] [Commented] (CRUNCH-360) GenericData.Record avro records without schema namespace gets implicit namespace"crunch"

Gabriel Reid (JIRA) Thu, 27 Feb 2014 07:02:34 -0800

    [ 
https://issues.apache.org/jira/browse/CRUNCH-360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13914615#comment-13914615
 ]


Gabriel Reid commented on CRUNCH-360:
-------------------------------------

To my knowledge, Crunch doesn't explicitly set the namespace to "crunch" on any 
schemas that it doesn't create itself. It does set the namespace to "crunch" on 
schemas it creates itself, such as union and tuple schemas. 

The issue that was happening in AVRO-1295 was that an enclosing schema's 
namespace would be pushed down to the inner schemas if the inner schemas didn't 
have an explicit namespace. I put together a small test case that exhibits the 
behaviour you're outlining here (when using a union) with Avro 1.7.4, and it 
works properly with Avro 1.7.5. I'll upload that test case in just a moment for 
reference.

Could you post a small test case that demonstrates the issue? 

> GenericData.Record avro records without schema namespace gets implicit 
> namespace"crunch"
> ----------------------------------------------------------------------------------------
>
>                 Key: CRUNCH-360
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-360
>             Project: Crunch
>          Issue Type: Bug
>            Reporter: Magnus Runesson
>
> When having avroschema without namespace crunch implicit adds the namespace 
> "crunch" when working with the records. Unfortunately this is not happening 
> to the schema when reading an avrofile with At.avroFile(Path path, 
> Configuration conf). The schema still has no namespace.
> In my job it ends up that my job fails looking up in the schema with the 
> following error:
> The job uses Avro GenericData.Record and gets the schema from the avro file 
> that is read. 
> 2014-02-27 09:58:14,236 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : org.apache.avro.UnresolvedUnionException: Not in 
> union 
> [{"type":"record","name":"UsernameToUserId","namespace":"crunch","fields":[{"name":"username","type":["string","null"]},{"name":"user_id","type":["string","null"]}]},"null"]:
>  {"username": "XXXXXXXXXX", "user_id": "XXXXXXXXXXXXXXXXXX"}
>       at 
> org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:561)
>       at 
> org.apache.avro.generic.GenericDatumWriter.resolveUnion(GenericDatumWriter.java:144)
>       at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:71)
>       at 
> org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:106)
>       at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:66)
>       at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:58)
>       at 
> org.apache.crunch.types.avro.SafeAvroSerialization$AvroWrapperSerializer.serialize(SafeAvroSerialization.java:128)
>       at 
> org.apache.crunch.types.avro.SafeAvroSerialization$AvroWrapperSerializer.serialize(SafeAvroSerialization.java:113)
>       at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1135)
>       at 
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691)
>       at 
> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>       at 
> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>       at 
> org.apache.crunch.impl.mr.emit.OutputEmitter.emit(OutputEmitter.java:41)
>       at org.apache.crunch.MapFn.process(MapFn.java:34)
>       at org.apache.crunch.impl.mr.run.RTNode.process(RTNode.java:98)
>       at 
> org.apache.crunch.impl.mr.emit.IntermediateEmitter.emit(IntermediateEmitter.java:56)
>       at org.apache.crunch.MapFn.process(MapFn.java:34)
>       at org.apache.crunch.impl.mr.run.RTNode.process(RTNode.java:98)
>       at 
> org.apache.crunch.impl.mr.emit.IntermediateEmitter.emit(IntermediateEmitter.java:56)
>       at org.apache.crunch.MapFn.process(MapFn.java:34)
>       at org.apache.crunch.impl.mr.run.RTNode.process(RTNode.java:98)
>       at 
> org.apache.crunch.impl.mr.emit.IntermediateEmitter.emit(IntermediateEmitter.java:56)
>       at org.apache.crunch.MapFn.process(MapFn.java:34)
>       at org.apache.crunch.impl.mr.run.RTNode.process(RTNode.java:98)
>       at org.apache.crunch.impl.mr.run.RTNode.process(RTNode.java:109)
>       at org.apache.crunch.impl.mr.run.CrunchMapper.map(CrunchMapper.java:60)
>       at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
>       at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)
>       at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:396)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>       at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
> Either implicit namespace "crunch" should not be added anywhere. Or it must 
> be added, if no namespace provided, when reading schema from the avro file i 
> At.java:
>   public static SourceTarget<GenericData.Record> avroFile(Path path, 
> Configuration conf) {
>     return avroFile(path, Avros.generics(From.getSchemaFromPath(path, conf)));
>   }
> I use on HEAD in master.
>  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CRUNCH-360) GenericData.Record avro records without schema namespace gets implicit namespace"crunch"

Reply via email to