You can use mapPartitions to do that.

On Friday, August 14, 2015, 周千昊 <[email protected]> wrote:

> I am thinking that creating a shared object outside the closure, use this
> object to hold the byte array.
> will this work?
>
> 周千昊 <[email protected] 
> <javascript:_e(%7B%7D,'cvml','[email protected]');>>于2015年8月14日周五
> 下午4:02写道:
>
>> Hi,
>>     All I want to do is that,
>>     1. read from some source
>>     2. do some calculation to get some byte array
>>     3. write the byte array to hdfs
>>     In hadoop, I can share an ImmutableByteWritable, and do some
>> System.arrayCopy, it will prevent the application from creating a lot of
>> small objects which will improve the gc latency.
>>     *However I was wondering if there is any solution like above in
>> spark that can avoid creating small objects*
>>
>

Reply via email to