Jay,

Without seeing the whole stack trace all I can say as cause for that
exception from a job is:

1. You're using threads and the API components you are using isn't
thread safe in your version of Hadoop.
2. Files are being written out to HDFS directories without following
the OC rules. (This is negated, per your response).

On Mon, Apr 2, 2012 at 7:35 PM, Jay Vyas <jayunit...@gmail.com> wrote:
> No, my job does not write files directly to disk. It simply goes to some
> web pages , reads data (in the reducer phase), and parses jsons into thrift
> objects which are emitted via the standard MultipleOutputs API to hdfs
> files.
>
> Any idea why hadoop would throw the "AlreadyBeingCreatedException" ?
>
> On Mon, Apr 2, 2012 at 2:52 PM, Harsh J <ha...@cloudera.com> wrote:
>
>> Jay,
>>
>> What does your job do? Create files directly on HDFS? If so, do you
>> follow this method?:
>>
>> http://wiki.apache.org/hadoop/FAQ#Can_I_write_create.2BAC8-write-to_hdfs_files_directly_from_map.2BAC8-reduce_tasks.3F
>>
>> A local filesystem may not complain if you re-create an existent file.
>> HDFS' behavior here is different. This simple Python test is what I
>> mean:
>> >>> a = open('a', 'w')
>> >>> a.write('f')
>> >>> b = open('a', 'w')
>> >>> b.write('s')
>> >>> a.close(), b.close()
>> >>> open('a').read()
>> 's'
>>
>> Hence it is best to use the FileOutputCommitter framework as detailed
>> in the mentioned link.
>>
>> On Mon, Apr 2, 2012 at 7:09 PM, Jay Vyas <jayunit...@gmail.com> wrote:
>> > Hi guys:
>> >
>> > I have a map reduce job that runs normally on local file system from
>> > eclipse, *but* it fails on HDFS running in psuedo distributed mode.
>> >
>> > The exception I see is
>> >
>> > *org.apache.hadoop.ipc.RemoteException:
>> > org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException:*
>> >
>> >
>> > Any thoughts on why this might occur in psuedo distributed mode, but not
>> in
>> > regular file system ?
>>
>>
>>
>> --
>> Harsh J
>>
>
>
>
> --
> Jay Vyas
> MMSB/UCHC



-- 
Harsh J

Reply via email to