apurtell commented on a change in pull request #3377:
URL: https://github.com/apache/hbase/pull/3377#discussion_r650325541
##########
File path:
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALCellCodec.java
##########
@@ -381,8 +385,13 @@ private static void checkLength(int len, int max) throws
IOException {
private void readCompressedValue(InputStream in, byte[] outArray, int
outOffset,
int expectedLength) throws IOException {
int compressedLen = StreamUtils.readRawVarint32(in);
- int read = compression.getValueCompressor().decompress(in,
compressedLen, outArray,
- outOffset, expectedLength);
+ // A partial read of the compressed bytes, depending on which
compression codec is used,
+ // can cause messy IO errors. This can happen when the reader is
actively tailing a file
+ // being written, for replication.
+ byte[] buffer = new byte[compressedLen];
+ IOUtils.readFully(in, buffer, 0, compressedLen);
Review comment:
> Trying to understand how this actually works, are we relying on the
EOFException thrown by readFully here so that upper layers in
ProtofbufReader#next() handles it?
No.
Well, yes, that would happen, but that is not the fix.
There are two problems if we use the input stream directly:
- Sometimes the input stream lies about the number of available bytes
- If we do not read in the complete segment of the compressed stream, the
decompressor, depending on type, will throw random exceptions, maybe IO
exceptions, maybe others. These are not EOFExceptions. They permanently confuse
the log reader.
The fix is to always provide a complete segment of the compressed stream to
the decompressor. A buffer and IOUtils.readFully is required for that.
##########
File path:
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALCellCodec.java
##########
@@ -381,8 +385,13 @@ private static void checkLength(int len, int max) throws
IOException {
private void readCompressedValue(InputStream in, byte[] outArray, int
outOffset,
int expectedLength) throws IOException {
int compressedLen = StreamUtils.readRawVarint32(in);
- int read = compression.getValueCompressor().decompress(in,
compressedLen, outArray,
- outOffset, expectedLength);
+ // A partial read of the compressed bytes, depending on which
compression codec is used,
+ // can cause messy IO errors. This can happen when the reader is
actively tailing a file
+ // being written, for replication.
+ byte[] buffer = new byte[compressedLen];
+ IOUtils.readFully(in, buffer, 0, compressedLen);
Review comment:
> Trying to understand how this actually works, are we relying on the
EOFException thrown by readFully here so that upper layers in
ProtofbufReader#next() handles it?
No.
Well, yes, that would happen, but that is not the fix.
There are two problems if we use the input stream directly:
- Sometimes the input stream lies about the number of available bytes
- If we do not read in the complete segment of the compressed stream, the
decompressor, depending on type, will throw random exceptions, maybe IO
exceptions, maybe others. These are not EOFExceptions. They permanently confuse
the log reader.
Flushing the writer is insufficient, although I do that too in this patch to
attempt to minimize the time where a tailer might not have a complete WALedit
serialization at the current end of file.
The fix is to always provide a complete segment of the compressed stream to
the decompressor. A buffer and IOUtils.readFully is required for that.
##########
File path:
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALCellCodec.java
##########
@@ -381,8 +385,13 @@ private static void checkLength(int len, int max) throws
IOException {
private void readCompressedValue(InputStream in, byte[] outArray, int
outOffset,
int expectedLength) throws IOException {
int compressedLen = StreamUtils.readRawVarint32(in);
- int read = compression.getValueCompressor().decompress(in,
compressedLen, outArray,
- outOffset, expectedLength);
+ // A partial read of the compressed bytes, depending on which
compression codec is used,
+ // can cause messy IO errors. This can happen when the reader is
actively tailing a file
+ // being written, for replication.
+ byte[] buffer = new byte[compressedLen];
+ IOUtils.readFully(in, buffer, 0, compressedLen);
Review comment:
> Trying to understand how this actually works, are we relying on the
EOFException thrown by readFully here so that upper layers in
ProtofbufReader#next() handles it?
No.
Well, yes, that would happen, but that is not the fix.
There are two problems if we use the input stream directly:
- Sometimes the input stream lies about the number of available bytes
- If we do not read in the complete segment of the compressed stream, the
decompressor, depending on type, will throw random exceptions, maybe IO
exceptions, maybe others. These are not EOFExceptions. They permanently confuse
the log reader.
While decompressing the short read bytes the decompressor will emit bytes
into its output stream. If we were going to catch such exceptions we need to
rewind both the reader and the output stream.
Flushing the writer is insufficient, although I do that too in this patch to
attempt to minimize the time where a tailer might not have a complete WALedit
serialization at the current end of file.
The fix is to always provide a complete segment of the compressed stream to
the decompressor. A buffer and IOUtils.readFully is required for that.
##########
File path:
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALCellCodec.java
##########
@@ -381,8 +385,13 @@ private static void checkLength(int len, int max) throws
IOException {
private void readCompressedValue(InputStream in, byte[] outArray, int
outOffset,
int expectedLength) throws IOException {
int compressedLen = StreamUtils.readRawVarint32(in);
- int read = compression.getValueCompressor().decompress(in,
compressedLen, outArray,
- outOffset, expectedLength);
+ // A partial read of the compressed bytes, depending on which
compression codec is used,
+ // can cause messy IO errors. This can happen when the reader is
actively tailing a file
+ // being written, for replication.
+ byte[] buffer = new byte[compressedLen];
+ IOUtils.readFully(in, buffer, 0, compressedLen);
Review comment:
> Trying to understand how this actually works, are we relying on the
EOFException thrown by readFully here so that upper layers in
ProtofbufReader#next() handles it?
No.
Well, yes, that would happen, but that is not the fix.
There are two problems if we use the input stream directly:
- Sometimes the input stream lies about the number of available bytes
- If we do not read in the complete segment of the compressed stream, the
decompressor, depending on type, will throw random exceptions, maybe IO
exceptions, maybe others. These are not EOFExceptions. They permanently confuse
the log reader.
While decompressing the short read bytes the decompressor will emit bytes
into its output stream. If we were going to catch such exceptions and convert
to EOFException we still need to rewind both the reader and the output stream.
I could explore this alternative but is it less expensive than just copying in
the compressed bytes first? The most common case for reading WALs will be
replication's active tailing.
Flushing the writer is insufficient, although I do that too in this patch to
attempt to minimize the time where a tailer might not have a complete WALedit
serialization at the current end of file.
The fix is to always provide a complete segment of the compressed stream to
the decompressor. A buffer and IOUtils.readFully is required for that.
##########
File path:
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALCellCodec.java
##########
@@ -381,8 +385,13 @@ private static void checkLength(int len, int max) throws
IOException {
private void readCompressedValue(InputStream in, byte[] outArray, int
outOffset,
int expectedLength) throws IOException {
int compressedLen = StreamUtils.readRawVarint32(in);
- int read = compression.getValueCompressor().decompress(in,
compressedLen, outArray,
- outOffset, expectedLength);
+ // A partial read of the compressed bytes, depending on which
compression codec is used,
+ // can cause messy IO errors. This can happen when the reader is
actively tailing a file
+ // being written, for replication.
+ byte[] buffer = new byte[compressedLen];
+ IOUtils.readFully(in, buffer, 0, compressedLen);
Review comment:
> Trying to understand how this actually works, are we relying on the
EOFException thrown by readFully here so that upper layers in
ProtofbufReader#next() handles it?
No.
Well, yes, that would happen, but that is not the fix.
There are two problems if we use the input stream directly:
- Sometimes the input stream lies about the number of available bytes. I
attempted using InputStream#available to ensure that after we read in the vint
of how many compressed bytes follow, that at least that many bytes are
available to read, but sometimes still got short reads.
- If we do not read in the complete segment of the compressed stream, the
decompressor, depending on type, will throw random exceptions, maybe IO
exceptions, maybe others. These are not EOFExceptions. They permanently confuse
the log reader.
While decompressing the short read bytes the decompressor will emit bytes
into its output stream. If we were going to catch such exceptions and convert
to EOFException we still need to rewind both the reader and the output stream.
I could explore this alternative but is it less expensive than just copying in
the compressed bytes first? The most common case for reading WALs will be
replication's active tailing.
Flushing the writer is insufficient, although I do that too in this patch to
attempt to minimize the time where a tailer might not have a complete WALedit
serialization at the current end of file.
The fix is to always provide a complete segment of the compressed stream to
the decompressor. A buffer and IOUtils.readFully is required for that.
##########
File path:
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALCellCodec.java
##########
@@ -381,8 +385,13 @@ private static void checkLength(int len, int max) throws
IOException {
private void readCompressedValue(InputStream in, byte[] outArray, int
outOffset,
int expectedLength) throws IOException {
int compressedLen = StreamUtils.readRawVarint32(in);
- int read = compression.getValueCompressor().decompress(in,
compressedLen, outArray,
- outOffset, expectedLength);
+ // A partial read of the compressed bytes, depending on which
compression codec is used,
+ // can cause messy IO errors. This can happen when the reader is
actively tailing a file
+ // being written, for replication.
+ byte[] buffer = new byte[compressedLen];
+ IOUtils.readFully(in, buffer, 0, compressedLen);
Review comment:
> Trying to understand how this actually works, are we relying on the
EOFException thrown by readFully here so that upper layers in
ProtofbufReader#next() handles it?
No.
Well, yes, that would happen, but that is not the fix.
There are two problems if we use the input stream directly:
- Sometimes the input stream lies about the number of available bytes. I
attempted using InputStream#available to ensure that after we read in the vint
of how many compressed bytes follow, that at least that many bytes are
available to read, but sometimes still got short reads even when a sufficient
number of bytes were alleged to be available.
- If we do not read in the complete segment of the compressed stream, the
decompressor, depending on type, will throw random exceptions, maybe IO
exceptions, maybe others. These are not EOFExceptions. They permanently confuse
the log reader.
While decompressing the short read bytes the decompressor will emit bytes
into its output stream. If we were going to catch such exceptions and convert
to EOFException we still need to rewind both the reader and the output stream.
I could explore this alternative but is it less expensive than just copying in
the compressed bytes first? The most common case for reading WALs will be
replication's active tailing.
Flushing the writer is insufficient, although I do that too in this patch to
attempt to minimize the time where a tailer might not have a complete WALedit
serialization at the current end of file.
The fix is to always provide a complete segment of the compressed stream to
the decompressor. A buffer and IOUtils.readFully is required for that.
##########
File path:
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALCellCodec.java
##########
@@ -381,8 +385,13 @@ private static void checkLength(int len, int max) throws
IOException {
private void readCompressedValue(InputStream in, byte[] outArray, int
outOffset,
int expectedLength) throws IOException {
int compressedLen = StreamUtils.readRawVarint32(in);
- int read = compression.getValueCompressor().decompress(in,
compressedLen, outArray,
- outOffset, expectedLength);
+ // A partial read of the compressed bytes, depending on which
compression codec is used,
+ // can cause messy IO errors. This can happen when the reader is
actively tailing a file
+ // being written, for replication.
+ byte[] buffer = new byte[compressedLen];
+ IOUtils.readFully(in, buffer, 0, compressedLen);
Review comment:
> Trying to understand how this actually works, are we relying on the
EOFException thrown by readFully here so that upper layers in
ProtofbufReader#next() handles it?
No.
Well, yes, that would happen, and that is part of the fix, but is not the
whole story. It's actually more important not to provide the decompression
stream with insufficient input to decompress the WALedit's value.
There are two problems if we use the input stream directly:
- Sometimes the input stream lies about the number of available bytes. I
attempted using InputStream#available to ensure that after we read in the vint
of how many compressed bytes follow, that at least that many bytes are
available to read, but sometimes still got short reads even when a sufficient
number of bytes were alleged to be available.
- If we do not read in the complete segment of the compressed stream, the
decompressor, depending on type, will throw random exceptions, maybe IO
exceptions, maybe others. These are not EOFExceptions. They permanently confuse
the log reader.
While decompressing the short read bytes the decompressor will emit bytes
into its output stream. If we were going to catch such exceptions and convert
to EOFException we still need to rewind both the reader and the output stream.
I could explore this alternative but is it less expensive than just copying in
the compressed bytes first? The most common case for reading WALs will be
replication's active tailing.
Flushing the writer is insufficient, although I do that too in this patch to
attempt to minimize the time where a tailer might not have a complete WALedit
serialization at the current end of file.
The fix is to always provide a complete segment of the compressed stream to
the decompressor. A buffer and IOUtils.readFully is required for that.
##########
File path:
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALCellCodec.java
##########
@@ -381,8 +385,13 @@ private static void checkLength(int len, int max) throws
IOException {
private void readCompressedValue(InputStream in, byte[] outArray, int
outOffset,
int expectedLength) throws IOException {
int compressedLen = StreamUtils.readRawVarint32(in);
- int read = compression.getValueCompressor().decompress(in,
compressedLen, outArray,
- outOffset, expectedLength);
+ // A partial read of the compressed bytes, depending on which
compression codec is used,
+ // can cause messy IO errors. This can happen when the reader is
actively tailing a file
+ // being written, for replication.
+ byte[] buffer = new byte[compressedLen];
+ IOUtils.readFully(in, buffer, 0, compressedLen);
Review comment:
> Trying to understand how this actually works, are we relying on the
EOFException thrown by readFully here so that upper layers in
ProtofbufReader#next() handles it?
No.
Well, yes, that would happen, and that is part of the fix, but is not the
whole story. It's actually more important to avoid the case where the
decompression stream has insufficient input to decompress the WALedit's value.
There are two problems if we use the input stream directly:
- Sometimes the input stream lies about the number of available bytes. I
attempted using InputStream#available to ensure that after we read in the vint
of how many compressed bytes follow, that at least that many bytes are
available to read, but sometimes still got short reads even when a sufficient
number of bytes were alleged to be available.
- If we do not read in the complete segment of the compressed stream, the
decompressor, depending on type, will throw random exceptions, maybe IO
exceptions, maybe others. These are not EOFExceptions. They permanently confuse
the log reader.
While decompressing the short read bytes the decompressor will emit bytes
into its output stream. If we were going to catch such exceptions and convert
to EOFException we still need to rewind both the reader and the output stream.
I could explore this alternative but is it less expensive than just copying in
the compressed bytes first? The most common case for reading WALs will be
replication's active tailing.
Flushing the writer is insufficient, although I do that too in this patch to
attempt to minimize the time where a tailer might not have a complete WALedit
serialization at the current end of file.
The fix is to always provide a complete segment of the compressed stream to
the decompressor. A buffer and IOUtils.readFully is required for that.
##########
File path:
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALCellCodec.java
##########
@@ -381,8 +385,13 @@ private static void checkLength(int len, int max) throws
IOException {
private void readCompressedValue(InputStream in, byte[] outArray, int
outOffset,
int expectedLength) throws IOException {
int compressedLen = StreamUtils.readRawVarint32(in);
- int read = compression.getValueCompressor().decompress(in,
compressedLen, outArray,
- outOffset, expectedLength);
+ // A partial read of the compressed bytes, depending on which
compression codec is used,
+ // can cause messy IO errors. This can happen when the reader is
actively tailing a file
+ // being written, for replication.
+ byte[] buffer = new byte[compressedLen];
+ IOUtils.readFully(in, buffer, 0, compressedLen);
Review comment:
> Trying to understand how this actually works, are we relying on the
EOFException thrown by readFully here so that upper layers in
ProtofbufReader#next() handles it?
No.
Well, yes, that would happen, and that is part of the fix, but is not the
whole story. It's actually more important to avoid the case where the
decompressor has insufficient input to decompress the WALedit's value.
There are two problems if we use the input stream directly:
- Sometimes the input stream lies about the number of available bytes. I
attempted using InputStream#available to ensure that after we read in the vint
of how many compressed bytes follow, that at least that many bytes are
available to read, but sometimes still got short reads even when a sufficient
number of bytes were alleged to be available.
- If we do not read in the complete segment of the compressed stream, the
decompressor, depending on type, will throw random exceptions, maybe IO
exceptions, maybe others. These are not EOFExceptions. They permanently confuse
the log reader.
While decompressing the short read bytes the decompressor will emit bytes
into its output stream. If we were going to catch such exceptions and convert
to EOFException we still need to rewind both the reader and the output stream.
I could explore this alternative but is it less expensive than just copying in
the compressed bytes first? The most common case for reading WALs will be
replication's active tailing.
Flushing the writer is insufficient, although I do that too in this patch to
attempt to minimize the time where a tailer might not have a complete WALedit
serialization at the current end of file.
The fix is to always provide a complete segment of the compressed stream to
the decompressor. A buffer and IOUtils.readFully is required for that.
##########
File path:
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALCellCodec.java
##########
@@ -381,8 +385,13 @@ private static void checkLength(int len, int max) throws
IOException {
private void readCompressedValue(InputStream in, byte[] outArray, int
outOffset,
int expectedLength) throws IOException {
int compressedLen = StreamUtils.readRawVarint32(in);
- int read = compression.getValueCompressor().decompress(in,
compressedLen, outArray,
- outOffset, expectedLength);
+ // A partial read of the compressed bytes, depending on which
compression codec is used,
+ // can cause messy IO errors. This can happen when the reader is
actively tailing a file
+ // being written, for replication.
+ byte[] buffer = new byte[compressedLen];
+ IOUtils.readFully(in, buffer, 0, compressedLen);
Review comment:
> Trying to understand how this actually works, are we relying on the
EOFException thrown by readFully here so that upper layers in
ProtofbufReader#next() handles it?
No.
Well, yes, that would happen, and that is part of the fix, but is not the
whole story. It's actually more important to avoid the case where the
decompressor has insufficient input to decompress the WALedit's value.
There are two problems if we use the input stream directly:
- If we do not read in the complete segment of the compressed stream, the
decompressor, depending on type, will throw random exceptions, maybe IO
exceptions, maybe others. These are not EOFExceptions. They permanently confuse
the log reader.
- Sometimes the input stream lies about the number of available bytes. I
attempted using InputStream#available to ensure that after we read in the vint
of how many compressed bytes follow, that at least that many bytes are
available to read, but sometimes still got short reads even when a sufficient
number of bytes were alleged to be available. (It's also possible I made an
error attempting to implement this.)
While decompressing the short read bytes the decompressor will emit bytes
into its output stream. If we were going to catch such exceptions and convert
to EOFException we still need to rewind both the reader and the output stream.
I could explore this alternative but is it less expensive than just copying in
the compressed bytes first? The most common case for reading WALs will be
replication's active tailing.
Flushing the writer is insufficient, although I do that too in this patch to
attempt to minimize the time where a tailer might not have a complete WALedit
serialization at the current end of file.
The fix is to always provide a complete segment of the compressed stream to
the decompressor. A buffer and IOUtils.readFully is required for that.
##########
File path:
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALCellCodec.java
##########
@@ -381,8 +385,13 @@ private static void checkLength(int len, int max) throws
IOException {
private void readCompressedValue(InputStream in, byte[] outArray, int
outOffset,
int expectedLength) throws IOException {
int compressedLen = StreamUtils.readRawVarint32(in);
- int read = compression.getValueCompressor().decompress(in,
compressedLen, outArray,
- outOffset, expectedLength);
+ // A partial read of the compressed bytes, depending on which
compression codec is used,
+ // can cause messy IO errors. This can happen when the reader is
actively tailing a file
+ // being written, for replication.
+ byte[] buffer = new byte[compressedLen];
+ IOUtils.readFully(in, buffer, 0, compressedLen);
Review comment:
> Where is this output stream that we need to rewind? Isn't that the job
of the compression codec to clean up the state if the read() fails (or did I
misunderstand something?)
The decompressor reads from the input stream, advancing it.
The decompressor writes to the output stream while decompressing, advancing
it.
If the current decompression fails and needs to be restarted with more data,
both input and output streams need to be rewound.
When tailing a log file for replication, this problem repeats constantly.
It is better to use IOUtils.readFully to ensure all compressed bytes for the
value are available before invoking the codec.
##########
File path:
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALCellCodec.java
##########
@@ -381,8 +385,13 @@ private static void checkLength(int len, int max) throws
IOException {
private void readCompressedValue(InputStream in, byte[] outArray, int
outOffset,
int expectedLength) throws IOException {
int compressedLen = StreamUtils.readRawVarint32(in);
- int read = compression.getValueCompressor().decompress(in,
compressedLen, outArray,
- outOffset, expectedLength);
+ // A partial read of the compressed bytes, depending on which
compression codec is used,
+ // can cause messy IO errors. This can happen when the reader is
actively tailing a file
+ // being written, for replication.
+ byte[] buffer = new byte[compressedLen];
+ IOUtils.readFully(in, buffer, 0, compressedLen);
Review comment:
> Where is this output stream that we need to rewind? Isn't that the job
of the compression codec to clean up the state if the read() fails (or did I
misunderstand something?)
The decompressor reads from the input stream, advancing it.
The decompressor writes to the output stream while decompressing, advancing
it.
If the current decompression fails because of a short read and needs to be
restarted with more data, both input and output streams will have advanced due
to the operation in progress, and need to be rewound, because the log reader
wants to rewind to the last known good location (assuming we convert this to an
EOFException).
When tailing a log file for replication, this problem repeats constantly.
It is better to use IOUtils.readFully to ensure all compressed bytes for the
value are available before invoking the codec.
##########
File path:
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALCellCodec.java
##########
@@ -381,8 +385,13 @@ private static void checkLength(int len, int max) throws
IOException {
private void readCompressedValue(InputStream in, byte[] outArray, int
outOffset,
int expectedLength) throws IOException {
int compressedLen = StreamUtils.readRawVarint32(in);
- int read = compression.getValueCompressor().decompress(in,
compressedLen, outArray,
- outOffset, expectedLength);
+ // A partial read of the compressed bytes, depending on which
compression codec is used,
+ // can cause messy IO errors. This can happen when the reader is
actively tailing a file
+ // being written, for replication.
+ byte[] buffer = new byte[compressedLen];
+ IOUtils.readFully(in, buffer, 0, compressedLen);
Review comment:
> Where is this output stream that we need to rewind? Isn't that the job
of the compression codec to clean up the state if the read() fails (or did I
misunderstand something?)
The decompressor reads from the input stream, advancing it.
The decompressor writes to the output stream while decompressing, advancing
it.
If the current decompression fails because of a short read and needs to be
restarted with more data, both input and output streams will have advanced due
to the operation in progress, and need to be rewound, because the log reader
wants to rewind to the last known good location and retry (assuming we convert
this to an EOFException).
When tailing a log file for replication, this problem repeats constantly.
It is better to use IOUtils.readFully to ensure all compressed bytes for the
value are available before invoking the codec.
##########
File path:
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALCellCodec.java
##########
@@ -381,8 +385,13 @@ private static void checkLength(int len, int max) throws
IOException {
private void readCompressedValue(InputStream in, byte[] outArray, int
outOffset,
int expectedLength) throws IOException {
int compressedLen = StreamUtils.readRawVarint32(in);
- int read = compression.getValueCompressor().decompress(in,
compressedLen, outArray,
- outOffset, expectedLength);
+ // A partial read of the compressed bytes, depending on which
compression codec is used,
+ // can cause messy IO errors. This can happen when the reader is
actively tailing a file
+ // being written, for replication.
+ byte[] buffer = new byte[compressedLen];
+ IOUtils.readFully(in, buffer, 0, compressedLen);
Review comment:
> Where is this output stream that we need to rewind? Isn't that the job
of the compression codec to clean up the state if the read() fails (or did I
misunderstand something?)
The decompressor reads from the input stream, advancing it.
The decompressor writes to the output stream while decompressing, advancing
it.
If the current decompression fails because of a short read and needs to be
restarted with more data, both input and output streams will have advanced due
to the operation in progress, and need to be rewound, because the log reader
wants to rewind to the last known good location and retry (assuming we convert
this to an EOFException).
When tailing a log file for replication, this problem repeats constantly.
It is better to use IOUtils.readFully to ensure all compressed bytes for the
value are available before invoking the codec. Nothing more need be done. At
least what are being read into the buffer are the compressed bytes...
##########
File path:
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALCellCodec.java
##########
@@ -381,8 +385,13 @@ private static void checkLength(int len, int max) throws
IOException {
private void readCompressedValue(InputStream in, byte[] outArray, int
outOffset,
int expectedLength) throws IOException {
int compressedLen = StreamUtils.readRawVarint32(in);
- int read = compression.getValueCompressor().decompress(in,
compressedLen, outArray,
- outOffset, expectedLength);
+ // A partial read of the compressed bytes, depending on which
compression codec is used,
+ // can cause messy IO errors. This can happen when the reader is
actively tailing a file
+ // being written, for replication.
+ byte[] buffer = new byte[compressedLen];
+ IOUtils.readFully(in, buffer, 0, compressedLen);
Review comment:
> Where is this output stream that we need to rewind? Isn't that the job
of the compression codec to clean up the state if the read() fails (or did I
misunderstand something?)
The decompressor reads from the input stream, advancing it.
The decompressor writes to the output stream while decompressing, advancing
it.
If the current decompression encounters short read -- where the input stream
claimed all of the data was available but it actually wasn't -- both the input
and output streams will have advanced. The codec will have read in some data,
and written out some decompressed bytes. This all has to be undone because the
log reader will rewind to the last known good location and retry (assuming we
convert this to an EOFException).
When tailing a log file for replication, this problem repeats constantly.
It is better to use IOUtils.readFully to ensure all compressed bytes for the
value are available before invoking the codec. Nothing more need be done. At
least what are being read into the buffer are the compressed bytes...
##########
File path:
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALCellCodec.java
##########
@@ -381,8 +385,13 @@ private static void checkLength(int len, int max) throws
IOException {
private void readCompressedValue(InputStream in, byte[] outArray, int
outOffset,
int expectedLength) throws IOException {
int compressedLen = StreamUtils.readRawVarint32(in);
- int read = compression.getValueCompressor().decompress(in,
compressedLen, outArray,
- outOffset, expectedLength);
+ // A partial read of the compressed bytes, depending on which
compression codec is used,
+ // can cause messy IO errors. This can happen when the reader is
actively tailing a file
+ // being written, for replication.
+ byte[] buffer = new byte[compressedLen];
+ IOUtils.readFully(in, buffer, 0, compressedLen);
Review comment:
> Where is this output stream that we need to rewind? Isn't that the job
of the compression codec to clean up the state if the read() fails (or did I
misunderstand something?)
The decompressor reads from the input stream, advancing it.
The decompressor writes to the output stream while decompressing, advancing
it.
If the current decompression encounters short read -- where the input stream
claimed all of the data was available but it actually wasn't* -- both the input
and output streams will have advanced. The codec will have read in some data,
and written out some decompressed bytes. This all has to be undone because the
log reader will rewind to the last known good location and retry (assuming we
convert this to an EOFException).
* - And there is the related problem of looping around available() until
more bytes are available. Does that work? Without reading anyway, will the
stream even make progress?
When tailing a log file for replication, this problem repeats constantly.
It is better to use IOUtils.readFully to ensure all compressed bytes for the
value are available before invoking the codec. Nothing more need be done. At
least what are being read into the buffer are the compressed bytes...
##########
File path:
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALCellCodec.java
##########
@@ -381,8 +385,13 @@ private static void checkLength(int len, int max) throws
IOException {
private void readCompressedValue(InputStream in, byte[] outArray, int
outOffset,
int expectedLength) throws IOException {
int compressedLen = StreamUtils.readRawVarint32(in);
- int read = compression.getValueCompressor().decompress(in,
compressedLen, outArray,
- outOffset, expectedLength);
+ // A partial read of the compressed bytes, depending on which
compression codec is used,
+ // can cause messy IO errors. This can happen when the reader is
actively tailing a file
+ // being written, for replication.
+ byte[] buffer = new byte[compressedLen];
+ IOUtils.readFully(in, buffer, 0, compressedLen);
Review comment:
> Where is this output stream that we need to rewind? Isn't that the job
of the compression codec to clean up the state if the read() fails (or did I
misunderstand something?)
The decompressor reads from the input stream, advancing it.
The decompressor writes to the output stream while decompressing, advancing
it.
If the current decompression encounters short read -- where the input stream
claimed all of the data was available but it actually wasn't* -- both the input
and output streams will have advanced. The codec will have read in some data,
and written out some decompressed bytes. This all has to be undone because the
log reader will rewind to the last known good location and retry (assuming we
convert this to an EOFException).
\* \- And there is the related problem of looping around available() until
more bytes are available. Does that work? Without reading anyway, will the
stream even make progress?
When tailing a log file for replication, this problem repeats constantly.
It is better to use IOUtils.readFully to ensure all compressed bytes for the
value are available before invoking the codec. Nothing more need be done. At
least what are being read into the buffer are the compressed bytes...
##########
File path:
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALCellCodec.java
##########
@@ -381,8 +385,13 @@ private static void checkLength(int len, int max) throws
IOException {
private void readCompressedValue(InputStream in, byte[] outArray, int
outOffset,
int expectedLength) throws IOException {
int compressedLen = StreamUtils.readRawVarint32(in);
- int read = compression.getValueCompressor().decompress(in,
compressedLen, outArray,
- outOffset, expectedLength);
+ // A partial read of the compressed bytes, depending on which
compression codec is used,
+ // can cause messy IO errors. This can happen when the reader is
actively tailing a file
+ // being written, for replication.
+ byte[] buffer = new byte[compressedLen];
+ IOUtils.readFully(in, buffer, 0, compressedLen);
Review comment:
> What I had in my mind was an "intercepting" input stream that wraps
this compressed input stream and keeps track of bytes read so far. It
essentially does what IOUtils.readFully() does but without copying it into a
buffer. It just throws EOFException when it is end of stream and readBytesSoFar
< totalBytesToBeRead.
Will the wrapped input stream get more bytes available unless someone
read()s?
If we have to read to get the stream to advance, is this different from
IOUtils.readFully?
##########
File path:
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALCellCodec.java
##########
@@ -381,8 +385,13 @@ private static void checkLength(int len, int max) throws
IOException {
private void readCompressedValue(InputStream in, byte[] outArray, int
outOffset,
int expectedLength) throws IOException {
int compressedLen = StreamUtils.readRawVarint32(in);
- int read = compression.getValueCompressor().decompress(in,
compressedLen, outArray,
- outOffset, expectedLength);
+ // A partial read of the compressed bytes, depending on which
compression codec is used,
+ // can cause messy IO errors. This can happen when the reader is
actively tailing a file
+ // being written, for replication.
+ byte[] buffer = new byte[compressedLen];
+ IOUtils.readFully(in, buffer, 0, compressedLen);
Review comment:
> What I had in my mind was an "intercepting" input stream that wraps
this compressed input stream and keeps track of bytes read so far. It
essentially does what IOUtils.readFully() does but without copying it into a
buffer. It just throws EOFException when it is end of stream and readBytesSoFar
< totalBytesToBeRead.
Will the wrapped input stream get more bytes available unless someone
read()s?
If we have to read to get the stream to advance, is this different from
IOUtils.readFully?
I can certainly implement an input stream that wraps another input stream
and, when given an explicit amount of data to read, uses a buffer to accumulate
those bytes before returning control to its caller, if you prefer. The current
patch is equivalent but does it directly in WALCellCodec instead.
##########
File path:
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALCellCodec.java
##########
@@ -381,8 +385,13 @@ private static void checkLength(int len, int max) throws
IOException {
private void readCompressedValue(InputStream in, byte[] outArray, int
outOffset,
int expectedLength) throws IOException {
int compressedLen = StreamUtils.readRawVarint32(in);
- int read = compression.getValueCompressor().decompress(in,
compressedLen, outArray,
- outOffset, expectedLength);
+ // A partial read of the compressed bytes, depending on which
compression codec is used,
+ // can cause messy IO errors. This can happen when the reader is
actively tailing a file
+ // being written, for replication.
+ byte[] buffer = new byte[compressedLen];
+ IOUtils.readFully(in, buffer, 0, compressedLen);
Review comment:
> Where is this output stream that we need to rewind? Isn't that the job
of the compression codec to clean up the state if the read() fails (or did I
misunderstand something?)
The decompressor reads from the input stream, advancing it.
The decompressor writes to the output stream while decompressing, advancing
it.
If the current decompression encounters short read -- where the input stream
claimed all of the data was available but it actually wasn't* -- both the input
and output streams will have advanced. The codec will have read in some data,
and written out some decompressed bytes. This all has to be undone because the
log reader will rewind to the last known good location and retry (assuming we
convert this to an EOFException).
\* \- And there is the related problem of looping around available() until
more bytes are available. Does that work? Without reading anyway, will the
stream even make progress?
When tailing a log file for replication, this problem repeats constantly.
##########
File path:
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALCellCodec.java
##########
@@ -381,8 +385,13 @@ private static void checkLength(int len, int max) throws
IOException {
private void readCompressedValue(InputStream in, byte[] outArray, int
outOffset,
int expectedLength) throws IOException {
int compressedLen = StreamUtils.readRawVarint32(in);
- int read = compression.getValueCompressor().decompress(in,
compressedLen, outArray,
- outOffset, expectedLength);
+ // A partial read of the compressed bytes, depending on which
compression codec is used,
+ // can cause messy IO errors. This can happen when the reader is
actively tailing a file
+ // being written, for replication.
+ byte[] buffer = new byte[compressedLen];
+ IOUtils.readFully(in, buffer, 0, compressedLen);
Review comment:
> Where is this output stream that we need to rewind? Isn't that the job
of the compression codec to clean up the state if the read() fails (or did I
misunderstand something?)
The decompressor reads from the input stream, advancing it.
The decompressor writes to the output stream while decompressing, advancing
it.
If the current decompression encounters short read -- where the input stream
claimed all of the data was available but it actually wasn't* -- both the input
and output streams will have advanced by the time the codec finally runs out of
input bits and throws an exception. The codec will have read in some data, and
written out some decompressed bytes. This all has to be undone because the log
reader will rewind to the last known good location and retry (assuming we
convert this to an EOFException).
\* \- And there is the related problem of looping around available() until
more bytes are available. Does that work? Without reading anyway, will the
stream even make progress?
When tailing a log file for replication, this problem repeats constantly.
##########
File path:
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALCellCodec.java
##########
@@ -381,8 +385,13 @@ private static void checkLength(int len, int max) throws
IOException {
private void readCompressedValue(InputStream in, byte[] outArray, int
outOffset,
int expectedLength) throws IOException {
int compressedLen = StreamUtils.readRawVarint32(in);
- int read = compression.getValueCompressor().decompress(in,
compressedLen, outArray,
- outOffset, expectedLength);
+ // A partial read of the compressed bytes, depending on which
compression codec is used,
+ // can cause messy IO errors. This can happen when the reader is
actively tailing a file
+ // being written, for replication.
+ byte[] buffer = new byte[compressedLen];
+ IOUtils.readFully(in, buffer, 0, compressedLen);
Review comment:
> Where is this output stream that we need to rewind? Isn't that the job
of the compression codec to clean up the state if the read() fails (or did I
misunderstand something?)
The decompressor reads from the input stream, advancing it.
The decompressor writes to the output stream while decompressing, advancing
it.
If the current decompression encounters short read -- where the input stream
claimed all of the data was available but it actually wasn't* -- both the input
and output streams will have advanced by the time the codec finally runs out of
input bits and throws an exception.
\* \- There is the related problem of looping around available() until more
bytes are available. Without reading anyway, will the stream even make
progress?
There is also the problem of deciding what exceptions to convert to
EOFException. I don't think we want to do that for data corruption cases. If
the data is corrupt rewinding and retrying repeatedly will not help. It would
be better to fail out the reader right away because it will never succeed.
When tailing a log file for replication, this problem repeats constantly.
##########
File path:
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALCellCodec.java
##########
@@ -381,8 +385,13 @@ private static void checkLength(int len, int max) throws
IOException {
private void readCompressedValue(InputStream in, byte[] outArray, int
outOffset,
int expectedLength) throws IOException {
int compressedLen = StreamUtils.readRawVarint32(in);
- int read = compression.getValueCompressor().decompress(in,
compressedLen, outArray,
- outOffset, expectedLength);
+ // A partial read of the compressed bytes, depending on which
compression codec is used,
+ // can cause messy IO errors. This can happen when the reader is
actively tailing a file
+ // being written, for replication.
+ byte[] buffer = new byte[compressedLen];
+ IOUtils.readFully(in, buffer, 0, compressedLen);
Review comment:
> Where is this output stream that we need to rewind? Isn't that the job
of the compression codec to clean up the state if the read() fails (or did I
misunderstand something?)
Yeah I didn't communicate that well. The decompressor reads from the input
stream, advancing it. The decompressor writes to the output stream consumed by
the log reader while decompressing, advancing it. The number of bytes output
exceeds the number of bytes input. If the current decompression encounters
short read -- where the input stream claimed all of the data was available but
it actually wasn't* -- it throws an exception and in the current code it is the
job of the WAL reader to clean this up, by rewinding the input and retrying,
and it becomes confused.
\* \- There is the related problem of looping around available() until more
bytes are available. Without reading anyway, will the stream even make
progress?
There is also the problem of deciding what exceptions to convert to
EOFException. I don't think we want to do that for data corruption cases. If
the data is corrupt rewinding and retrying repeatedly will not help. It would
be better to fail out the reader right away because it will never succeed.
When tailing a log file for replication the possibility of partial reads of
serialized WALedits at end of file is high.
##########
File path:
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALCellCodec.java
##########
@@ -381,8 +385,13 @@ private static void checkLength(int len, int max) throws
IOException {
private void readCompressedValue(InputStream in, byte[] outArray, int
outOffset,
int expectedLength) throws IOException {
int compressedLen = StreamUtils.readRawVarint32(in);
- int read = compression.getValueCompressor().decompress(in,
compressedLen, outArray,
- outOffset, expectedLength);
+ // A partial read of the compressed bytes, depending on which
compression codec is used,
+ // can cause messy IO errors. This can happen when the reader is
actively tailing a file
+ // being written, for replication.
+ byte[] buffer = new byte[compressedLen];
+ IOUtils.readFully(in, buffer, 0, compressedLen);
Review comment:
> Where is this output stream that we need to rewind? Isn't that the job
of the compression codec to clean up the state if the read() fails (or did I
misunderstand something?)
Yeah I didn't communicate that well. The decompressor reads from the input
stream, advancing it. The decompressor writes to the output stream consumed by
the log reader while decompressing, advancing it. The number of bytes output
exceeds the number of bytes input. If the current decompression encounters
short read -- where the input stream claimed all of the data was available but
it actually wasn't* -- it throws an exception and in the current code it is the
job of the WAL reader to clean this up, by rewinding the input and retrying,
and it becomes confused about position.
\* \- There is the related problem of looping around available() until more
bytes are available. Without reading anyway, will the stream even make
progress?
There is also the problem of deciding what exceptions to convert to
EOFException. I don't think we want to do that for data corruption cases. If
the data is corrupt rewinding and retrying repeatedly will not help. It would
be better to fail out the reader right away because it will never succeed.
When tailing a log file for replication the possibility of partial reads of
serialized WALedits at end of file is high.
##########
File path:
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALCellCodec.java
##########
@@ -381,8 +385,13 @@ private static void checkLength(int len, int max) throws
IOException {
private void readCompressedValue(InputStream in, byte[] outArray, int
outOffset,
int expectedLength) throws IOException {
int compressedLen = StreamUtils.readRawVarint32(in);
- int read = compression.getValueCompressor().decompress(in,
compressedLen, outArray,
- outOffset, expectedLength);
+ // A partial read of the compressed bytes, depending on which
compression codec is used,
+ // can cause messy IO errors. This can happen when the reader is
actively tailing a file
+ // being written, for replication.
+ byte[] buffer = new byte[compressedLen];
+ IOUtils.readFully(in, buffer, 0, compressedLen);
Review comment:
> What I had in my mind was an "intercepting" input stream that wraps
this compressed input stream and keeps track of bytes read so far. It
essentially does what IOUtils.readFully() does but without copying it into a
buffer. It just throws EOFException when it is end of stream and readBytesSoFar
< totalBytesToBeRead.
This would almost work but the "intercepting" stream has to continue to
read/retry to trigger the IO up the stream hierarchy to eventually read in
totalBytesToBeRead, when they become available. This isn't a permanent EOF,
it's a temporary EOF while we are waiting for some buffering somewhere in the
stack to flush.
I will give this an attempt and get back to you. @bharathv
##########
File path:
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALCellCodec.java
##########
@@ -381,8 +385,13 @@ private static void checkLength(int len, int max) throws
IOException {
private void readCompressedValue(InputStream in, byte[] outArray, int
outOffset,
int expectedLength) throws IOException {
int compressedLen = StreamUtils.readRawVarint32(in);
- int read = compression.getValueCompressor().decompress(in,
compressedLen, outArray,
- outOffset, expectedLength);
+ // A partial read of the compressed bytes, depending on which
compression codec is used,
+ // can cause messy IO errors. This can happen when the reader is
actively tailing a file
+ // being written, for replication.
+ byte[] buffer = new byte[compressedLen];
+ IOUtils.readFully(in, buffer, 0, compressedLen);
Review comment:
No, I don't understand how you want me to do this differently. We don't
control the read() in the codec implementation. We can't modify them to use
some new readFully() method of an input stream. If we try to hide this with the
input stream implementation itself i.e. override available() as read into a
buffer to ensure availability and then feed that buffer into read() until
empty, we've just complicated things for no gain, there is still a buffer that
is filled to ensure availability.
##########
File path:
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALCellCodec.java
##########
@@ -381,8 +385,13 @@ private static void checkLength(int len, int max) throws
IOException {
private void readCompressedValue(InputStream in, byte[] outArray, int
outOffset,
int expectedLength) throws IOException {
int compressedLen = StreamUtils.readRawVarint32(in);
- int read = compression.getValueCompressor().decompress(in,
compressedLen, outArray,
- outOffset, expectedLength);
+ // A partial read of the compressed bytes, depending on which
compression codec is used,
+ // can cause messy IO errors. This can happen when the reader is
actively tailing a file
+ // being written, for replication.
+ byte[] buffer = new byte[compressedLen];
+ IOUtils.readFully(in, buffer, 0, compressedLen);
Review comment:
> essentially does what IOUtils.readFully() does but without copying it
into a buffer.
This is the part that seems impossible. In order to do what
IOUtils.readFully does you have to actually read(), and what is read in has to
go *somewhere*, and it can't go to the codec until finished.
##########
File path:
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALCellCodec.java
##########
@@ -381,8 +385,13 @@ private static void checkLength(int len, int max) throws
IOException {
private void readCompressedValue(InputStream in, byte[] outArray, int
outOffset,
int expectedLength) throws IOException {
int compressedLen = StreamUtils.readRawVarint32(in);
- int read = compression.getValueCompressor().decompress(in,
compressedLen, outArray,
- outOffset, expectedLength);
+ // A partial read of the compressed bytes, depending on which
compression codec is used,
+ // can cause messy IO errors. This can happen when the reader is
actively tailing a file
+ // being written, for replication.
+ byte[] buffer = new byte[compressedLen];
+ IOUtils.readFully(in, buffer, 0, compressedLen);
Review comment:
> essentially does what IOUtils.readFully() does but without copying it
into a buffer.
This is the part that seems impossible. In order to do what
IOUtils.readFully does you have to actually read(), and what is read in has to
go *somewhere*, and it can't go to the codec until finished, because the codec
does read()s assuming the stream has everything it needs. We get one chance to
assure all bytes are available, and that is before we pass control to the
decompressor. In WALCellCodec.
##########
File path:
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALCellCodec.java
##########
@@ -381,8 +385,13 @@ private static void checkLength(int len, int max) throws
IOException {
private void readCompressedValue(InputStream in, byte[] outArray, int
outOffset,
int expectedLength) throws IOException {
int compressedLen = StreamUtils.readRawVarint32(in);
- int read = compression.getValueCompressor().decompress(in,
compressedLen, outArray,
- outOffset, expectedLength);
+ // A partial read of the compressed bytes, depending on which
compression codec is used,
+ // can cause messy IO errors. This can happen when the reader is
actively tailing a file
+ // being written, for replication.
+ byte[] buffer = new byte[compressedLen];
+ IOUtils.readFully(in, buffer, 0, compressedLen);
Review comment:
> essentially does what IOUtils.readFully() does but without copying it
into a buffer.
This is the part that seems impossible. In order to do what
IOUtils.readFully does you have to actually read(), and what is read in has to
go *somewhere*, and it can't go to the codec until finished, because the codec
does read()s assuming the stream has everything it needs. We get one chance to
assure all bytes are available, and wait until that is the case, doing IO here
and there as necessary, and that is before we pass control to the decompressor.
In WALCellCodec.
##########
File path:
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALCellCodec.java
##########
@@ -381,8 +385,13 @@ private static void checkLength(int len, int max) throws
IOException {
private void readCompressedValue(InputStream in, byte[] outArray, int
outOffset,
int expectedLength) throws IOException {
int compressedLen = StreamUtils.readRawVarint32(in);
- int read = compression.getValueCompressor().decompress(in,
compressedLen, outArray,
- outOffset, expectedLength);
+ // A partial read of the compressed bytes, depending on which
compression codec is used,
+ // can cause messy IO errors. This can happen when the reader is
actively tailing a file
+ // being written, for replication.
+ byte[] buffer = new byte[compressedLen];
+ IOUtils.readFully(in, buffer, 0, compressedLen);
Review comment:
No, I don't understand how you want me to do this differently. We don't
control the read() in the codec implementation. We can't modify them to use
some new readFully() method of an input stream. If we try to hide this with the
input stream implementation itself i.e. override available() as read into a
buffer to ensure availability and then feed that buffer into read() until
empty, we've just complicated things for no gain, there is still a buffer that
is filled to ensure availability.
##########
File path:
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALCellCodec.java
##########
@@ -381,8 +385,13 @@ private static void checkLength(int len, int max) throws
IOException {
private void readCompressedValue(InputStream in, byte[] outArray, int
outOffset,
int expectedLength) throws IOException {
int compressedLen = StreamUtils.readRawVarint32(in);
- int read = compression.getValueCompressor().decompress(in,
compressedLen, outArray,
- outOffset, expectedLength);
+ // A partial read of the compressed bytes, depending on which
compression codec is used,
+ // can cause messy IO errors. This can happen when the reader is
actively tailing a file
+ // being written, for replication.
+ byte[] buffer = new byte[compressedLen];
+ IOUtils.readFully(in, buffer, 0, compressedLen);
Review comment:
> essentially does what IOUtils.readFully() does but without copying it
into a buffer.
This is the part that seems impossible. In order to do what
IOUtils.readFully does you have to actually read(), and what is read in has to
go *somewhere*, and it can't go to the codec until finished, because the codec
does arbitrary read()s assuming the stream has everything it needs. We get one
chance to assure all bytes are available, and wait until that is the case,
doing IO here and there as necessary, and that is before we pass control to the
decompressor. In WALCellCodec.
##########
File path:
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALCellCodec.java
##########
@@ -381,8 +385,13 @@ private static void checkLength(int len, int max) throws
IOException {
private void readCompressedValue(InputStream in, byte[] outArray, int
outOffset,
int expectedLength) throws IOException {
int compressedLen = StreamUtils.readRawVarint32(in);
- int read = compression.getValueCompressor().decompress(in,
compressedLen, outArray,
- outOffset, expectedLength);
+ // A partial read of the compressed bytes, depending on which
compression codec is used,
+ // can cause messy IO errors. This can happen when the reader is
actively tailing a file
+ // being written, for replication.
+ byte[] buffer = new byte[compressedLen];
+ IOUtils.readFully(in, buffer, 0, compressedLen);
Review comment:
Wait.
I just found something very stupid but effective to do, and yet valid.
##########
File path:
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALCellCodec.java
##########
@@ -381,8 +385,13 @@ private static void checkLength(int len, int max) throws
IOException {
private void readCompressedValue(InputStream in, byte[] outArray, int
outOffset,
int expectedLength) throws IOException {
int compressedLen = StreamUtils.readRawVarint32(in);
- int read = compression.getValueCompressor().decompress(in,
compressedLen, outArray,
- outOffset, expectedLength);
+ // A partial read of the compressed bytes, depending on which
compression codec is used,
+ // can cause messy IO errors. This can happen when the reader is
actively tailing a file
+ // being written, for replication.
+ byte[] buffer = new byte[compressedLen];
+ IOUtils.readFully(in, buffer, 0, compressedLen);
Review comment:
Wait.
I just found something very stupid but effective to do, and yet valid. No
copy.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]