[
https://issues.apache.org/jira/browse/AVRO-1208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13603413#comment-13603413
]
Yin Huai commented on AVRO-1208:
--------------------------------
Here are some comparisons between RCFile and Trevni. I mainly focus on the data
reading.
Based on current implementations, RCFile and Trevni have three major
differences.
# The row group size in RCFile is configurable. Trevni uses a single row group
for a Trevni file (so applications need to horizontally partition the table to
multiple Trevni files, and the size of a Trevni file needs to be less than the
HDFS block size. Am i right?).
# When reading needed columns of a row group, RCFile loads these columns at
once. So, applications need to wait the I/O (on reading all needed columns in a
row group) before accessing any row in this row group. In a Trevni file, a
column is stored by many small blocks which are compression units. When Trevni
needs to read data from disks, applications only wait for Trevni to read a few
blocks before accessing a row.
# When reading needed columns of a row group, RCFile loads these columns in a
column by column fashion. For Trevni, applications need to decide how to read
needed columns. They can read data in a row by row fashion or in a column by
column fashion.
For a given table, applications need to set a suitable row group size for
RCFile. A small row group size will cause a small size of a column in a row
group (a column in a row group is stored contiguously). Many seeks will degrade
the performance of data reading (this is described in Trevni specification).
Also, a small row group size can cause a read buffer contain data from unneeded
columns and cause the OS readahead less effective (cannot asynchronous fetch
data from needed columns).
To overcome the low I/O efficiency of RCFile, a large row group size can be
used. However, RCFile needs to read all needed columns of a row group at once.
In this way, CPU and I/O may not be effectively overlapped (less benefit from
OS asynchronous readahead). Suppose that applications explicitly stores a table
to multiple RCFile files and every file has a single row group. When
applications process data in a file, it will be blocked until all needed data
is loaded from disks. In this example, we will first wait on I/O and then wait
on CPU.
For the third difference, a large row group size in RCFile imply a higher I/O
performance since all needed columns in a row group are read in a column by
column fashion. But for Trevni, since applications usually read data in needed
columns in a row by row fashion (seems AvroColumnReader reads data in a row by
row fashion, and Hive and Pig integration of Trevni relies on this reader), the
throughput of reading data stored in Trevni can be significantly degraded
(cased by unnecessary disk seeks).
> Improve Trevni's performance on row-oriented data access
> --------------------------------------------------------
>
> Key: AVRO-1208
> URL: https://issues.apache.org/jira/browse/AVRO-1208
> Project: Avro
> Issue Type: Improvement
> Affects Versions: 1.7.3
> Reporter: Yin Huai
> Assignee: Yin Huai
> Attachments: AVRO-1208.1.patch, AVRO-1208.2.patch
>
>
> Trevni uses an 64KB internal buffer to store values of a column. When
> accessing a column, it reads 64KB (if we do not consider compression and
> checksum) data from the storage layer. However, when the table is accessed in
> a row-oriented fashion (a entire row needs to be handed over to the upper
> layer), in the worst case (a full table scan and values of this table are all
> the same size), every 64KB data read can cause a seek.
> This jira is used to discuss if we should consider the data access pattern
> mentioned above and if so, how to improve the performance of Trevni.
> Row-oriented data processing engines, e.g. Hive, can benefit from this work.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira