[
https://issues.apache.org/jira/browse/AVRO-813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14747050#comment-14747050
]
Lewis John McGibbney commented on AVRO-813:
-------------------------------------------
Can anyone confirm if the EOFException looks like the following?
{code}
Caused by: java.io.EOFException
at org.apache.avro.io.BinaryDecoder.ensureBounds(BinaryDecoder.java:473)
at org.apache.avro.io.BinaryDecoder.readInt(BinaryDecoder.java:128)
at org.apache.avro.io.BinaryDecoder.readIndex(BinaryDecoder.java:423)
at
org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:290)
at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
at
org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:267)
at
org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:178)
at
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152)
at
org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:240)
at
org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:230)
at
org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:174)
at
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152)
at
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:144)
at
org.apache.hadoop.io.serializer.avro.AvroSerialization$AvroDeserializer.deserialize(AvroSerialization.java:127)
at
org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKeyValue(ReduceContextImpl.java:146)
at
org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKey(ReduceContextImpl.java:121)
at
org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.nextKey(WrappedReducer.java:302)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:170)
at
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
at
org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
{code}
> EOFException is thrown during normal operation
> ----------------------------------------------
>
> Key: AVRO-813
> URL: https://issues.apache.org/jira/browse/AVRO-813
> Project: Avro
> Issue Type: Bug
> Components: java
> Affects Versions: 1.5.0
> Reporter: Bruno Dumon
> Assignee: Bruno Dumon
> Labels: memex
> Fix For: 1.8.0
>
> Attachments: avro-813-patch.txt
>
>
> In an application that uses Avro as RPC mechanism (with the NettyTransceiver,
> but that's irrelevant), I've noticed in jprofiler that during normal
> operation quite some time was spent creating EOFExceptions:
> {noformat}
> 5.4% - 2,004 ms org.apache.avro.ipc.generic.GenericResponder.readRequest
> 5.0% - 1,871 ms org.apache.avro.generic.GenericDatumReader.read
> 4.9% - 1,832 ms org.apache.avro.generic.GenericDatumReader.read
> 4.9% - 1,832 ms org.apache.avro.generic.GenericDatumReader.readRecord
> 4.5% - 1,670 ms org.apache.avro.generic.GenericDatumReader.read
> 4.5% - 1,670 ms org.apache.avro.generic.GenericDatumReader.readRecord
> 4.3% - 1,596 ms org.apache.avro.generic.GenericDatumReader.read
> 2.8% - 1,048 ms org.apache.avro.generic.GenericDatumReader.readArray
> 1.3% - 477 ms org.apache.avro.io.ValidatingDecoder.arrayNext
> 1.3% - 471 ms org.apache.avro.io.BinaryDecoder.arrayNext
> 1.3% - 466 ms org.apache.avro.io.BinaryDecoder.doReadItemCount
> 1.3% - 466 ms org.apache.avro.io.BinaryDecoder.readLong
> 1.3% - 466 ms org.apache.avro.io.BinaryDecoder.ensureBounds
> 1.3% - 466 ms org.apache.avro.io.BinaryDecoder$ByteSource.compactAndFill
> 1.3% - 466 ms
> org.apache.avro.io.BinaryDecoder$InputStreamByteSource.tryReadRaw
> 1.3% - 466 ms org.apache.avro.util.ByteBufferInputStream.read
> 1.3% - 466 ms org.apache.avro.util.ByteBufferInputStream.getBuffer
> 1.3% - 466 ms java.io.EOFException.<init>
> 1.3% - 466 ms java.io.IOException.<init>
> 1.2% - 460 ms java.lang.Exception.<init>
> 1.2% - 460 ms java.lang.Throwable.<init>
> 1.2% - 460 ms java.lang.Throwable.fillInStackTrace
> {noformat}
> These exceptions are produced by the ByteBufferInputStream (which modifies
> InputStream's contract: return -1 at eof), but are catched higher up by the
> tryReadRaw method.
> What happens is this:
> The message in question has an (empty) array at the end of its message, thus
> the reader tries to read the size of this array in BinaryDecoder.readLong.
> This calls ensureBounds(10), whose contract is that it should read 10 bytes
> if they are available, and otherwise be quiet. ensureBounds calls via
> compactAndFill the tryReadRaw method. It is this method which catches the
> EOFException, because it only 'tries' to read so many bytes.
> Note that InputStreamByteSource.readRaw (without the 'try' part) does itself
> check if read < 0 in order to throw EOFException, making the throwing of
> EOFException in ByteBufferInputStream unnecessary (for this particular usage).
> There was some talk about EOFException in AVRO-392 too, though it seems this
> particular common case was not mentioned there. When using Avro RPC, or more
> in general, when using Avro to read small messages rather than large files,
> it seems like one can very easily run into this EOFException situation, which
> hurts performance.
> I'll attach a patch which simply removes the throwing of EOFException in
> ByteBufferInputStream, but this will likely break other cases which rely on
> the EOFException being thrown (haven't researched this to the bottom).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)