Re: Binary Files With No Record Begin and End

MJ Sam Thu, 05 Jul 2012 13:02:15 -0700

By Block Size, you mean the HDFS block size or split size or my record
size? The problem is that given a split to my mapper, how do make my
record reader to find where my record start in the given split stream
to the mapper when there is no record start tag? Would you please
explain more with what you mean?


On Thu, Jul 5, 2012 at 11:57 AM, Kai Voigt <k...@123.org> wrote:
> Hi,
>
> if you know the block size, you can calculate the offsets for your records. 
> And write a custom record reader class to seek into your records.
>
> Kai
>
> Am 05.07.2012 um 22:54 schrieb MJ Sam:
>
>> Hi,
>>
>> The input of my map reduce is a binary file with no record begin and
>> end marker. The only thing is that each record is a fixed 180bytes
>> size in the binary file. How do I make Hadoop to properly find the
>> record in the splits when a record overlap two splits. I was thinking
>> to make the splits size to be a multiple of 180 but was wondering if
>> there is anything else that I can do?  Please note that my files are
>> not sequence file and just a custom binary file.
>>
>
> --
> Kai Voigt
> k...@123.org
>
>
>
>

Re: Binary Files With No Record Begin and End

Reply via email to