[
https://issues.apache.org/jira/browse/HADOOP-3315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12631905#action_12631905
]
Hong Tang commented on HADOOP-3315:
-----------------------------------
A preliminary study on performance comparison between TFile and SequenceFile.
Settings:
- OS: RHEL AS4 Nahant Update 2. Linux 2.6.9-55.ELsmp.
- Hardware: dual 2GHz CPU, 4GB main memory, WD Caviar 400GB (with 60GB free
space).
- Key length: uniform random 50-100B.
- Value length: uniform random 5K-10K.
- Both keys and values are composed by using a "dictionary" of 1000 "words",
each word's length is uniformly distributed between 5-20B.
- Compression schemes: none, lzo, and gz. The # of <key, value> pairs are 600K,
3M, 3M for the three cases. The output file sizes in the three cases are 4.4G,
10G, and 6G respectively. This eliminates the file caching effect.
- Used SequenceFile and TFile to implement two common interfaces: one for
writing, and one for reading. As follows:
{code}
private interface KVAppendable {
public void append(BytesWritable key, BytesWritable value)
throws IOException;
public void close() throws IOException;
}
private interface KVReadable {
public byte[] getKey();
public byte[] getValue();
public int getKeyLength();
public int getValueLength();
public boolean next() throws IOException;
public void close() throws IOException;
}
{code}
Both interfaces allow for efficient implementations by either TFile or
SequenceFile to avoid any object creations or buffer copying to conform to the
interface.
Some finer details:
- For writing, the timing included the append() call (so TFile meta data
writing is not included), it also includes the time being used to compose keys
and values. The same seed is used to construct a Random object that does the
key/value composition.
- For reading, the timing includes the next() call followed by getKeyLength()
and getValueLength(). (getKey() and getValue() simply returns internally cached
key/value buffers, and is an O(1) operation).
- For each compression scheme, I run and time the following tasks: create
sequence file, read sequence file, create Tfile, read Tfile, create Tfile, read
Tfile, create sequence file, read Tfile. Then I only pick the better
performance of the two. This is to remove the possible effects with JVM hotspot
compilation, garbage collection, and/or occassional host-related activities (I
have seen tar, yum on top screen from time to time).
- Memory footprint: For SeqFile, it needs to cache the full block of
uncompressed data and compressed data, each in the order of 1MB. So the total
buffering is about 2MB. For TFile, the block size is set to 10MB, but the
amount of buffering is 4K (buffering for small writes before the
compression/decompression stream) + 256K (FS read/write buffering) + 1MB (for
writes if value length is not known, this is also tunable). They are comparable
but TFile is more malleable.
Finally, the results:
|| ||SeqFile-none || TFile-none || SeqFile-lzo || TFile-lzo ||
SeqFile-gz || TFile-gz||
|Data sizes (MB) | 4435.98 | 4435.98 | 22179.13 |
22179.13 | 22179.13 | 22179.13 |
|File sizes (MB) | 4456.58 | 4438.31 | 10080.23 |
10063.48 | 6236.91 | 5943.07 |
|Write Eff BW (MB/s) | 36.86 | 35.53 | 38.46 | 39.97 | 13.59 | 13.54 |
|Write I/O BW (MB/s) | 37.03 | 35.54 | 17.48 | 18.14 | 3.82 | 3.63 |
|Read Eff BW (MB/s) | 41.13 | 40.16 | 86.77 | 91.04 | 52.73 | 75.15 |
|Read I/O BW (MB/s) | 41.32 | 40.18 | 39.44 | 41.31 | 14.83 | 20.14 |
Things to notice:
- In most cases, SeqFile and TFile performance are similar.
- TFile sizes are usually smaller than SeqFile size - SeqFile encodes length
using 4B, and TFile uses VInt. The setup of the benchmark favors TFile in this
regard, fixed key/value sizes may make SeqFile smaller.
- For none compression, SeqFile outperforms TFile, because both only needs one
layer of buffering and TFile interface does require the creation of more small
objects to set up.
- For lzo and gz compression. TFile outperforms SeqFile due to the reduction of
extra buffer copying.
- Particularly for gz compression, the read performance of TFile is 42% faster.
The reason for that is because DecompressorStream always reads as much data as
possible from the downstream using the internal buffer size. And I took
advantage of that by skipping my own FS buffering. So bulk reads (reading a
value) incurs the least amount of buffer copying. (Not true for LZO, which uses
block compression, and the BlockDecompressorStream always read small blocks in
the order of 20KB). Reduced buffer copy saves CPU cycles and in the case of GZ,
CPU is the bottleneck.
The above results are still preliminary. No YourKit profiling is done on TFile
side. And results could vary for different settings - different key value
lengths, compression ratios, underlying I/O speed, etc.
> New binary file format
> ----------------------
>
> Key: HADOOP-3315
> URL: https://issues.apache.org/jira/browse/HADOOP-3315
> Project: Hadoop Core
> Issue Type: New Feature
> Components: io
> Reporter: Owen O'Malley
> Assignee: Amir Youssefi
> Attachments: HADOOP-3315_20080908_TFILE_PREVIEW_WITH_LZO_TESTS.patch,
> HADOOP-3315_20080915_TFILE.patch, TFile Specification Final.pdf
>
>
> SequenceFile's block compression format is too complex and requires 4 codecs
> to compress or decompress. It would be good to have a file format that only
> needs
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.