You can use mapPartitions to do that. On Friday, August 14, 2015, 周千昊 <qhz...@apache.org> wrote:
> I am thinking that creating a shared object outside the closure, use this > object to hold the byte array. > will this work? > > 周千昊 <qhz...@apache.org > <javascript:_e(%7B%7D,'cvml','qhz...@apache.org');>>于2015年8月14日周五 > 下午4:02写道: > >> Hi, >> All I want to do is that, >> 1. read from some source >> 2. do some calculation to get some byte array >> 3. write the byte array to hdfs >> In hadoop, I can share an ImmutableByteWritable, and do some >> System.arrayCopy, it will prevent the application from creating a lot of >> small objects which will improve the gc latency. >> *However I was wondering if there is any solution like above in >> spark that can avoid creating small objects* >> >