Hi Marco,

It's the Sun jvm on both;

Ubuntu:

java version "1.6.0_24"
Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)

Centos:

java version "1.6.0_16"
Java(TM) SE Runtime Environment (build 1.6.0_16-b01)
Java HotSpot(TM) 64-Bit Server VM (build 14.2-b01, mixed mode)


On Wed, Jun 15, 2011 at 4:32 PM, Marco Neumann <[email protected]>wrote:

> what jvm do you use on the machines?
>
>
> On Wed, Jun 15, 2011 at 11:23 AM, Richard Francis <[email protected]> wrote:
> > Hi,
> >
> > I'm using two identical machines in ec2 running tdbloader on centos
> (CentOS
> > release 5 (Final)) and Ubuntu 11.04 (natty)
> >
> > I've observed an issue where Centos will run happily at a consistent
> speed
> > and complete a load of 650million triples in around 12 hours, whereas the
> > load on Ubuntu, after just 15million triples tails off and runs at an
> ever
> > increasing slower interval.
> >
> > On initial observation of the Ubuntu machine I noticed that the flush-202
> > process was running quite high, also running iostat showed that io was
> the
> > real bottle neck - with the ubuntu machine showing a constant use of the
> > disk for both reads and writes (the centos machine had periods of no
> usage
> > followed by periods of writes). This led me to investigate how memory was
> > being used by the Ubuntu machine - and a few blog posts / tutorials later
> I
> > found a couple of settings to tweak - the first I tried
> > was dirty_writeback_centisecs - setting this to 0 had an immediate
> positive
> > effect on the load that I was performing - but after some more testing I
> > found that the problem was just put back to around 80million triples
> before
> > I saw a drop off on performance.
> >
> > This led me investigate whether there was the same issue with tdbloader2
> -
> > From my observations I got the same problem - but this time around 150m
> > triples.
> >
> > Again - I focused on "dirty" settings - and this time tweaking
> dirty_bytes
> > = 30000000000 and dirty_background_bytes = 15000000000 saw a massive
> > performance increase and for the vast part of add phase of the tdbloader
> it
> > kept up with the centos machine.
> >
> > Finally, last night I stopped all loads, and raced the centos machine and
> > the ubuntu machine - both have completed - but the Centos machine (around
> 12
> > hours) was still far quicker than the Ubuntu machine (20 hours).
> >
> > So my questions are, has anyone else observed this? - can anyone suggest
> any
> > further improvements - or things to try? - what is the best OS to perform
> a
> > tdbload on?
> >
> > Rich
> >
> >
> > Tests were performed on three different machines 1x Centos and 2 x Ubuntu
> -
> > to rule out EC2 being a bottle neck - all were  (from
> > http://aws.amazon.com/ec2/instance-types/)
> >
> > High-Memory Double Extra Large Instance
> >
> > 34.2 GB of memory
> > 13 EC2 Compute Units (4 virtual cores with 3.25 EC2 Compute Units each)
> > 850 GB of instance storage
> > 64-bit platform
> > I/O Performance: High
> > API name: m2.2xlarge
> > All machines are configured with no swap
> >
> > Here's the summary from the only completed load on Ubuntu;
> >
> > ** Index SPO->OSP: 685,552,449 slots indexed in 18,337.75 seconds [Rate:
> > 37,384.76 per second]
> > -- Finish triples index phase
> > ** 685,552,449 triples indexed in 37,063.51 seconds [Rate: 18,496.69 per
> > second]
> > -- Finish triples load
> > ** Completed: 685,552,449 triples loaded in 78,626.27 seconds [Rate:
> > 8,719.13 per second]
> > -- Finish quads load
> >
> > Some resources I used;
> > http://www.westnet.com/~gsmith/content/linux-pdflush.htm
> > http://arighi.blogspot.com/2008/10/fine-grained-dirtyratio-and.html
> >
>
>
>
> --
> Marco Neumann
> KONA
>

Reply via email to