No, my job does not write files directly to disk. It simply goes to some
web pages , reads data (in the reducer phase), and parses jsons into thrift
objects which are emitted via the standard MultipleOutputs API to hdfs
files.

Any idea why hadoop would throw the "AlreadyBeingCreatedException" ?

On Mon, Apr 2, 2012 at 2:52 PM, Harsh J <ha...@cloudera.com> wrote:

> Jay,
>
> What does your job do? Create files directly on HDFS? If so, do you
> follow this method?:
>
> http://wiki.apache.org/hadoop/FAQ#Can_I_write_create.2BAC8-write-to_hdfs_files_directly_from_map.2BAC8-reduce_tasks.3F
>
> A local filesystem may not complain if you re-create an existent file.
> HDFS' behavior here is different. This simple Python test is what I
> mean:
> >>> a = open('a', 'w')
> >>> a.write('f')
> >>> b = open('a', 'w')
> >>> b.write('s')
> >>> a.close(), b.close()
> >>> open('a').read()
> 's'
>
> Hence it is best to use the FileOutputCommitter framework as detailed
> in the mentioned link.
>
> On Mon, Apr 2, 2012 at 7:09 PM, Jay Vyas <jayunit...@gmail.com> wrote:
> > Hi guys:
> >
> > I have a map reduce job that runs normally on local file system from
> > eclipse, *but* it fails on HDFS running in psuedo distributed mode.
> >
> > The exception I see is
> >
> > *org.apache.hadoop.ipc.RemoteException:
> > org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException:*
> >
> >
> > Any thoughts on why this might occur in psuedo distributed mode, but not
> in
> > regular file system ?
>
>
>
> --
> Harsh J
>



-- 
Jay Vyas
MMSB/UCHC

Reply via email to