what jvm do you use on the machines?

On Wed, Jun 15, 2011 at 11:23 AM, Richard Francis <[email protected]> wrote:
> Hi,
>
> I'm using two identical machines in ec2 running tdbloader on centos (CentOS
> release 5 (Final)) and Ubuntu 11.04 (natty)
>
> I've observed an issue where Centos will run happily at a consistent speed
> and complete a load of 650million triples in around 12 hours, whereas the
> load on Ubuntu, after just 15million triples tails off and runs at an ever
> increasing slower interval.
>
> On initial observation of the Ubuntu machine I noticed that the flush-202
> process was running quite high, also running iostat showed that io was the
> real bottle neck - with the ubuntu machine showing a constant use of the
> disk for both reads and writes (the centos machine had periods of no usage
> followed by periods of writes). This led me to investigate how memory was
> being used by the Ubuntu machine - and a few blog posts / tutorials later I
> found a couple of settings to tweak - the first I tried
> was dirty_writeback_centisecs - setting this to 0 had an immediate positive
> effect on the load that I was performing - but after some more testing I
> found that the problem was just put back to around 80million triples before
> I saw a drop off on performance.
>
> This led me investigate whether there was the same issue with tdbloader2 -
> From my observations I got the same problem - but this time around 150m
> triples.
>
> Again - I focused on "dirty" settings - and this time tweaking dirty_bytes
> = 30000000000 and dirty_background_bytes = 15000000000 saw a massive
> performance increase and for the vast part of add phase of the tdbloader it
> kept up with the centos machine.
>
> Finally, last night I stopped all loads, and raced the centos machine and
> the ubuntu machine - both have completed - but the Centos machine (around 12
> hours) was still far quicker than the Ubuntu machine (20 hours).
>
> So my questions are, has anyone else observed this? - can anyone suggest any
> further improvements - or things to try? - what is the best OS to perform a
> tdbload on?
>
> Rich
>
>
> Tests were performed on three different machines 1x Centos and 2 x Ubuntu -
> to rule out EC2 being a bottle neck - all were  (from
> http://aws.amazon.com/ec2/instance-types/)
>
> High-Memory Double Extra Large Instance
>
> 34.2 GB of memory
> 13 EC2 Compute Units (4 virtual cores with 3.25 EC2 Compute Units each)
> 850 GB of instance storage
> 64-bit platform
> I/O Performance: High
> API name: m2.2xlarge
> All machines are configured with no swap
>
> Here's the summary from the only completed load on Ubuntu;
>
> ** Index SPO->OSP: 685,552,449 slots indexed in 18,337.75 seconds [Rate:
> 37,384.76 per second]
> -- Finish triples index phase
> ** 685,552,449 triples indexed in 37,063.51 seconds [Rate: 18,496.69 per
> second]
> -- Finish triples load
> ** Completed: 685,552,449 triples loaded in 78,626.27 seconds [Rate:
> 8,719.13 per second]
> -- Finish quads load
>
> Some resources I used;
> http://www.westnet.com/~gsmith/content/linux-pdflush.htm
> http://arighi.blogspot.com/2008/10/fine-grained-dirtyratio-and.html
>



-- 
Marco Neumann
KONA

Reply via email to