Thanks J : just curious about how you came to hypothesize (1) (i.e. regarding the fact that threads and the API componentns arent thread safe in my hadoop version).
I think thats a really good guess, and I would like to be able to make those sorts of intelligent hypotheses myself. Any reading you can point me to for further enlightement ? On Mon, Apr 2, 2012 at 3:16 PM, Harsh J <ha...@cloudera.com> wrote: > Jay, > > Without seeing the whole stack trace all I can say as cause for that > exception from a job is: > > 1. You're using threads and the API components you are using isn't > thread safe in your version of Hadoop. > 2. Files are being written out to HDFS directories without following > the OC rules. (This is negated, per your response). > > On Mon, Apr 2, 2012 at 7:35 PM, Jay Vyas <jayunit...@gmail.com> wrote: > > No, my job does not write files directly to disk. It simply goes to some > > web pages , reads data (in the reducer phase), and parses jsons into > thrift > > objects which are emitted via the standard MultipleOutputs API to hdfs > > files. > > > > Any idea why hadoop would throw the "AlreadyBeingCreatedException" ? > > > > On Mon, Apr 2, 2012 at 2:52 PM, Harsh J <ha...@cloudera.com> wrote: > > > >> Jay, > >> > >> What does your job do? Create files directly on HDFS? If so, do you > >> follow this method?: > >> > >> > http://wiki.apache.org/hadoop/FAQ#Can_I_write_create.2BAC8-write-to_hdfs_files_directly_from_map.2BAC8-reduce_tasks.3F > >> > >> A local filesystem may not complain if you re-create an existent file. > >> HDFS' behavior here is different. This simple Python test is what I > >> mean: > >> >>> a = open('a', 'w') > >> >>> a.write('f') > >> >>> b = open('a', 'w') > >> >>> b.write('s') > >> >>> a.close(), b.close() > >> >>> open('a').read() > >> 's' > >> > >> Hence it is best to use the FileOutputCommitter framework as detailed > >> in the mentioned link. > >> > >> On Mon, Apr 2, 2012 at 7:09 PM, Jay Vyas <jayunit...@gmail.com> wrote: > >> > Hi guys: > >> > > >> > I have a map reduce job that runs normally on local file system from > >> > eclipse, *but* it fails on HDFS running in psuedo distributed mode. > >> > > >> > The exception I see is > >> > > >> > *org.apache.hadoop.ipc.RemoteException: > >> > org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException:* > >> > > >> > > >> > Any thoughts on why this might occur in psuedo distributed mode, but > not > >> in > >> > regular file system ? > >> > >> > >> > >> -- > >> Harsh J > >> > > > > > > > > -- > > Jay Vyas > > MMSB/UCHC > > > > -- > Harsh J > -- Jay Vyas MMSB/UCHC