Unfortunately I can post only some snapshots.

I have no region split (I insert just 100000 rows so there is no split, except when I don't use compression).

I use HBase 0.20.2 and to insert I use the HTable.put(list<Put>);

The only difference between my 3 tests is the way I create the test table:

HBaseAdmin admin = new HBaseAdmin(config);

HTableDescriptor desc = new HTableDescriptor(name);

HColumnDescriptor colDesc;

colDesc = new HColumnDescriptor(Bytes.toBytes("meta:"));
colDesc.setMaxVersions(1);
colDesc.setCompressionType(Algorithm.GZ); <- LZO or NONE
desc.addFamily(colDesc);

colDesc = new HColumnDescriptor(Bytes.toBytes("data:"));
colDesc.setMaxVersions(1);
colDesc.setCompressionType(Algorithm.GZ); <- LZO or NONE
desc.addFamily(colDesc);

admin.createTable(desc);

A typical row inserted is made of 13 columns with a short content, as show here:

1264761195240/6ffc3fe659023 column=data:accuracy, timestamp=1267006115356, value=1317 a3c9cfed0a50a9f199ed42f2730 1264761195240/6ffc3fe659023 column=data:alt, timestamp=1267006115356, value=0 a3c9cfed0a50a9f199ed42f2730 1264761195240/6ffc3fe659023 column=data:country, timestamp=1267006115356, value=France a3c9cfed0a50a9f199ed42f2730 1264761195240/6ffc3fe659023 column=data:countrycode, timestamp=1267006115356, value=FR a3c9cfed0a50a9f199ed42f2730 1264761195240/6ffc3fe659023 column=data:lat, timestamp=1267006115356, value=48.65869706 a3c9cfed0a50a9f199ed42f2730 1264761195240/6ffc3fe659023 column=data:locality, timestamp=1267006115356, value=Morsang-sur-Orge a3c9cfed0a50a9f199ed42f2730 1264761195240/6ffc3fe659023 column=data:lon, timestamp=1267006115356, value=2.36138182 a3c9cfed0a50a9f199ed42f2730 1264761195240/6ffc3fe659023 column=data:postalcode, timestamp=1267006115356, value=91390 a3c9cfed0a50a9f199ed42f2730 1264761195240/6ffc3fe659023 column=data:region, timestamp=1267006115356, value=Ile-de-France a3c9cfed0a50a9f199ed42f2730 1264761195240/6ffc3fe659023 column=meta:imei, timestamp=1267006115356, value=6ffc3fe659023a3c9cfed0a50a9f199e a3c9cfed0a50a9f199ed42f2730 d42f2730 1264761195240/6ffc3fe659023 column=meta:infoid, timestamp=1267006115356, value=ca30781e0c375a1236afbf323cbfa4 a3c9cfed0a50a9f199ed42f2730 0dc2c7c7af 1264761195240/6ffc3fe659023 column=meta:locid, timestamp=1267006115356, value=5e15a0281e83cfe55ec1c362f84a39f a3c9cfed0a50a9f199ed42f2730 006f18128 1264761195240/6ffc3fe659023 column=meta:timestamp, timestamp=1267006115356, value=1264761195240 a3c9cfed0a50a9f199ed42f2730

Maybe LZO works much better with fewer rows with bigger content?

Le 24/02/10 19:10, Jean-Daniel Cryans a écrit :
Are you able to post the code used for the insertion? It could be
something with your usage pattern or something wrong with the code
itself.

How many rows are you inserting? Do you even have some region splits?

J-D

On Wed, Feb 24, 2010 at 1:52 AM, Vincent Barat<vincent.ba...@ubikod.com>  wrote:
Yes of course.

We use a 4 machine cluster (4 large instances on AWS): 8 GB RAM each, dual
core CPU. 1 is for the Hadoop and HBase namenode / masters, and 3 are
hosting the datanode / regionservers.

The table used for testing is first created, then I insert sequentially a
set of rows and count the nb of rows inserted by second.

I insert rows by set of 1000 (using HTable.put(list<Put>);

When reading, I read also sequentially by using a scanner (scanner caching
is set to 1024 rows).

Maybe our installation of LZO is not good ?


Le 23/02/10 22:15, Jean-Daniel Cryans a écrit :

Vincent,

I don't expect that either, can you give us more info about your test
environment?

Thx,

J-D

On Tue, Feb 23, 2010 at 10:39 AM, Vincent Barat
<vincent.ba...@ubikod.com>    wrote:

Hello,

I did some testing to figure out which compression algo I should use for
my
HBase tables. I thought that LZO was the good candidate, but it appears
that
it is the worst one.

I uses one table with 2 families and 10 columns. Each row has a total of
200
to 400 bytes.

Here is my results:

GZIP:           2600 to 3200 inserts/s  12000 to 15000 reads/s
NO COMPRESSION: 2000 to 2600 inserts/s  4900 to 5020 reads/s
LZO             1600 to 2100 inserts/s  4020 to 4600 reads/s

Do you have an explanation to this ? I though that the LZO compression
was
always faster at compression and decompression than GZIP ?






Reply via email to