[jira] [Commented] (AVRO-1215) AvroMultipleOutputs not working when specifying baseOutputPath

Ashish Nagavaram (JIRA) Thu, 31 Jan 2013 17:38:14 -0800

    [ 
https://issues.apache.org/jira/browse/AVRO-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568359#comment-13568359
 ]


Ashish Nagavaram commented on AVRO-1215:
----------------------------------------

Hi Priyo,

I have already attached a patch 
https://issues.apache.org/jira/secure/attachment/12566040/AVRO-1215-v3.patch in 
this bug which has a fix for write(Object key, Object value, String 
baseOutputPath). I have tested this and it works fine for me. 

I also added another method in the AvroMultipleOutputs, write(K,V, keyschema , 
valueschema, baseoutputpath) where we can define multiple schemas, can you try 
testing your code with this method?

The main reason for using a map to store schemas was to avoid parsing it again 
(since some schema declarations maybe huge). 

This map is populated from the main function of the mapreduce code and the 
AvroMultipleOutputs class is instantiated in the setup method of the reducer. 
http://svn.apache.org/repos/asf/avro/trunk/lang/java/mapred/src/test/java/org/apache/avro/mapreduce/TestAvroMultipleOutputs.java

Please let me know if it still doesn't work. In the meanwhile I will add 
methods to expose key and value schemas given the job and namedoutput.

                
> AvroMultipleOutputs not working when specifying baseOutputPath
> --------------------------------------------------------------
>
>                 Key: AVRO-1215
>                 URL: https://issues.apache.org/jira/browse/AVRO-1215
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.7.2
>            Reporter: Matthew Hayes
>            Assignee: Ashish Nagavaram
>              Labels: avro, mapreduce
>         Attachments: avro-1215.patch, AVRO-1215.patch, AVRO-1215-v2.patch, 
> AVRO-1215-v3.patch
>
>
> I'm calling the write() method of AvroMultipleOutputs which takes the 
> baseOutputPath.  The reducer appears to begin hanging once it tries writing 
> to a baseOuputPath value not already encountered.  It then fails with:
> org.apache.hadoop.ipc.RemoteException: 
> org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to 
> create file ... because current leaseholder is trying to recreate file.
> I think the problem has to do with this line in AvroMultipleOutputs:
> {code}
> // get the record writer from context output format
> //FileOutputFormat.setOutputName(taskContext, baseFileName);
> {code}
> This line is not commented out in the similar code from Hadoop.  So I think 
> the baseOutputPath is ignored.  As a result when each record writer is 
> created it uses the same path, leading to the exception.
> Uncommenting this line does not work because of visibility of the method.  
> However what this method does is set "mapreduce.output.basename".  But 
> setting this doesn't work either.  
> After digging through Avro code I found that AvroOutputFormatBase is using 
> "avro.mo.config.namedOutput" to create the path.  If I replace the commented 
> out line with this it seems to work:
> {code}
> taskContext.getConfiguration().set("avro.mo.config.namedOutput", 
> baseFileName);  
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-1215) AvroMultipleOutputs not working when specifying baseOutputPath

Reply via email to