Thanks for the clarification. Though I still find it strange why not have the get() method return what's actually stored regardless of buffer size. Is there any reason why you'd want to use/examine what's in the buffer?
Todd Lipcon-4 wrote: > > Hi Bing, > > The issue here is that BytesWritable uses an internal buffer which is > grown > but not shrunk. The cause of this is that Writables in general are single > instances that are shared across multiple input records. If you look at > the > internals of the input reader, you'll see that a single BytesWritable is > instantiated, and then each time a record is read, it's read into that > same > instance. The purpose here is to avoid the allocation cost for each row. > > The end result is, as you've seen, that getBytes() returns an array which > may be larger than the actual amount of data. In fact, the extra bytes > (between .getSize() and .get().length) have undefined contents, not zero. > > Unfortunately, if the protobuffer API doesn't allow you to deserialize out > of a smaller portion of a byte array, you're out of luck and will have to > do > the copy like you've mentioned. I imagine, though, that there's some way > around this in the protobuffer API - perhaps you can use a > ByteArrayInputStream here to your advantage. > > Hope that helps > -Todd > > On Wed, Apr 8, 2009 at 4:59 PM, bzheng <[email protected]> wrote: > >> >> I tried to store protocolbuffer as BytesWritable in a sequence file >> <Text, >> BytesWritable>. It's stored using SequenceFile.Writer(new Text(key), new >> BytesWritable(protobuf.convertToBytes())). When reading the values from >> key/value pairs using value.get(), it returns more then what's stored. >> However, value.getSize() returns the correct number. This means in order >> to >> convert the byte[] to protocol buffer again, I have to do >> Arrays.copyOf(value.get(), value.getSize()). This happens on both >> version >> 0.17.2 and 0.18.3. Does anyone know why this happens? Sample sizes for >> a >> few entries in the sequence file below. The extra bytes in value.get() >> all >> have values of zero. >> >> value.getSize(): 7066 value.get().length: 10599 >> value.getSize(): 36456 value.get().length: 54684 >> value.getSize(): 32275 value.get().length: 54684 >> value.getSize(): 40561 value.get().length: 54684 >> value.getSize(): 16855 value.get().length: 54684 >> value.getSize(): 66304 value.get().length: 99456 >> value.getSize(): 26488 value.get().length: 99456 >> value.getSize(): 59327 value.get().length: 99456 >> value.getSize(): 36865 value.get().length: 99456 >> >> -- >> View this message in context: >> http://www.nabble.com/BytesWritable-get%28%29-returns-more-bytes-then-what%27s-stored-tp22962146p22962146.html >> Sent from the Hadoop core-user mailing list archive at Nabble.com. >> >> > > -- View this message in context: http://www.nabble.com/BytesWritable-get%28%29-returns-more-bytes-then-what%27s-stored-tp22962146p22963309.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
