[ 
https://issues.apache.org/jira/browse/ORC-854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17384338#comment-17384338
 ] 

David Mollitor commented on ORC-854:
------------------------------------

{code:none|title=ORC-854}
data/generated/taxi/orc.none rows: 22773249 batches: 22240 time: 26957ms
data/generated/taxi/orc.snappy rows: 22773249 batches: 22240 time: 56788ms
data/generated/taxi/orc.gz rows: 22773249 batches: 22240 time: 93950ms
{code}

{code:none|title=main}
data/generated/taxi/orc.none rows: 22773249 batches: 22240 time: 27986
data/generated/taxi/orc.snappy rows: 22773249 batches: 22240 time: 59811
data/generated/taxi/orc.gz rows: 22773249 batches: 22240 time: 98614
{code}

> Optimize ReadFully for Full Reads
> ---------------------------------
>
>                 Key: ORC-854
>                 URL: https://issues.apache.org/jira/browse/ORC-854
>             Project: ORC
>          Issue Type: Improvement
>            Reporter: David Mollitor
>            Assignee: David Mollitor
>            Priority: Minor
>
> {code:java|title=SerializationUtils.java}
>   private void readFully(final InputStream in, final byte[] buffer, final int 
> off, final int len)
>       throws IOException {
>     int n = 0;
>     while (n < len) {
>       int count = in.read(buffer, off + n, len - n);
>       if (count < 0) {
>         throw new EOFException("Read past EOF for " + in);
>       }
>       n += count;
>     }
>   }
> {code}
> This code is reading only small buffers: 4/8 bytes at a time. Very unlikely 
> that it will need to read more than once from the underlying, buffered, data 
> stream.  Optimize this code by assuming that reading from the underlying 
> source will always return the requested number of bytes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to