[
https://issues.apache.org/jira/browse/AVRO-1208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yin Huai updated AVRO-1208:
---------------------------
Attachment: AVRO-1208.1.patch
a initial implementation of the patch. I added two methods in ColumnValues.
Users can call startRowWithPrefetch(int numOfPrefetched) to ask Trevni to
prefetch n blocks when it needs to read a new block from the disk. Inside of
startRowWithPrefetch, startBlockWithPrefetch(int block, int numOfPrefetched) is
called to read totally numOfPrefetched+1 blocks.
> Improve Trevni's performance on row-oriented data access
> --------------------------------------------------------
>
> Key: AVRO-1208
> URL: https://issues.apache.org/jira/browse/AVRO-1208
> Project: Avro
> Issue Type: Improvement
> Affects Versions: 1.7.3
> Reporter: Yin Huai
> Attachments: AVRO-1208.1.patch
>
>
> Trevni uses an 64KB internal buffer to store values of a column. When
> accessing a column, it reads 64KB (if we do not consider compression and
> checksum) data from the storage layer. However, when the table is accessed in
> a row-oriented fashion (a entire row needs to be handed over to the upper
> layer), in the worst case (a full table scan and values of this table are all
> the same size), every 64KB data read can cause a seek.
> This jira is used to discuss if we should consider the data access pattern
> mentioned above and if so, how to improve the performance of Trevni.
> Row-oriented data processing engines, e.g. Hive, can benefit from this work.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira