Re: Problem with InputStream.skip()

Raghu Angadi Fri, 25 May 2007 15:57:23 -0700

Also, reading from block supports 'real skip', ie, it does not checkchecksum if an entire checksum block (usually 512 bytes) falls withinthe skip range. Another reason to implement our own skip.


Raghu Angadi wrote:

In Hadoop, whenever possible, we read directly to user buffer. E.g. inChecksumFileSystem we read into user buffer and then do a checksum, I dothe same in new Block level CRCs. This is very useful since this avoidsan extra copy in most cases.
We don't define skip() for our extensions of InputStream since we knowdefault implementation calls read(). But the problem is thatInputStream.skip() uses a *static* byte buffer (from its perspective, itmakes sense). So if we have two parallel skip() on unrelated streams,we will surely get checksum errors.
When this happened with Block level CRCs, I wasted time trying to find abug in the new code.
My prefered fix would be to implement skip() in Hadoop() level. Alwayscopying to user buffer would be very defensive fix.
Raghu.

Re: Problem with InputStream.skip()

Reply via email to