[
https://issues.apache.org/jira/browse/ORC-854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17384338#comment-17384338
]
David Mollitor commented on ORC-854:
------------------------------------
{code:none|title=ORC-854}
data/generated/taxi/orc.none rows: 22773249 batches: 22240 time: 26957ms
data/generated/taxi/orc.snappy rows: 22773249 batches: 22240 time: 56788ms
data/generated/taxi/orc.gz rows: 22773249 batches: 22240 time: 93950ms
{code}
{code:none|title=main}
data/generated/taxi/orc.none rows: 22773249 batches: 22240 time: 27986
data/generated/taxi/orc.snappy rows: 22773249 batches: 22240 time: 59811
data/generated/taxi/orc.gz rows: 22773249 batches: 22240 time: 98614
{code}
> Optimize ReadFully for Full Reads
> ---------------------------------
>
> Key: ORC-854
> URL: https://issues.apache.org/jira/browse/ORC-854
> Project: ORC
> Issue Type: Improvement
> Reporter: David Mollitor
> Assignee: David Mollitor
> Priority: Minor
>
> {code:java|title=SerializationUtils.java}
> private void readFully(final InputStream in, final byte[] buffer, final int
> off, final int len)
> throws IOException {
> int n = 0;
> while (n < len) {
> int count = in.read(buffer, off + n, len - n);
> if (count < 0) {
> throw new EOFException("Read past EOF for " + in);
> }
> n += count;
> }
> }
> {code}
> This code is reading only small buffers: 4/8 bytes at a time. Very unlikely
> that it will need to read more than once from the underlying, buffered, data
> stream. Optimize this code by assuming that reading from the underlying
> source will always return the requested number of bytes.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)