Hi Andy,

Both file systems are the same - we're using the ephemeral storage on the
ec2 node - both machines are ext3;

Ubuntu:

df -Th /mnt
Filesystem    Type    Size  Used Avail Use% Mounted on
/dev/xvdb     ext3    827G  240G  545G  31% /mnt

Centos;

df -Th /mnt
Filesystem    Type    Size  Used Avail Use% Mounted on
/dev/sdb      ext3    827G  191G  595G  25% /mnt

Both the input ntriples and the output indexes are written to this
partition.
meminfo does show some differences - I believe mainly because the Ubuntu
instance is a later kernel (2.6.38-8-virtual vs. 2.6.16.33-xenU). There does
seem to be a difference between the mapped values, and I think I should
investigate the HugePages & DirectMap settings.

Ubuntu;
cat /proc/meminfo
MemTotal:       35129364 kB
MemFree:          817100 kB
Buffers:           70780 kB
Cached:         32674868 kB
SwapCached:            0 kB
Active:         17471436 kB
Inactive:       15297084 kB
Active(anon):      25752 kB
Inactive(anon):       44 kB
Active(file):   17445684 kB
Inactive(file): 15297040 kB
Unevictable:        3800 kB
Mlocked:            3800 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:             10664 kB
Writeback:             0 kB
AnonPages:         26808 kB
Mapped:             7012 kB
Shmem:               176 kB
Slab:             855516 kB
SReclaimable:     847652 kB
SUnreclaim:         7864 kB
KernelStack:         680 kB
PageTables:         2044 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    17564680 kB
Committed_AS:      39488 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      114504 kB
VmallocChunk:   34359623800 kB
HardwareCorrupted:     0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:    35848192 kB
DirectMap2M:           0 kB

***************************************************
Centos;
cat /proc/meminfo
MemTotal:     35840000 kB
MemFree:         31424 kB
Buffers:        166428 kB
Cached:       34658344 kB
SwapCached:          0 kB
Active:        1033384 kB
Inactive:     33803304 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:     35840000 kB
LowFree:         31424 kB
SwapTotal:           0 kB
SwapFree:            0 kB
Dirty:             220 kB
Writeback:           0 kB
Mapped:          17976 kB
Slab:           223256 kB
CommitLimit:  17920000 kB
Committed_AS:    38020 kB
PageTables:       1528 kB
VmallocTotal: 34359738367 kB
VmallocUsed:       164 kB
VmallocChunk: 34359738203 kB

Thanks,
Rich

On Wed, Jun 15, 2011 at 10:07 PM, Andy Seaborne <
[email protected]> wrote:
>
> > So my questions are, has anyone else observed this? - can anyone suggest
any
> > further improvements - or things to try? - what is the best OS to
perform a
> > tdbload on?
>
> Richard - very useful feedback, thank you.
>
> I haven't come across this before - and the difference is quite
surprising.
>
> What is the "mapped" value on each machine?
> Could you "cat /proc/meminfo"?
>
> TDB is using memory mapped files - I'm wondering if the amount of RAM
available to the processes is different in some way.  Together with the
parameters you have found to have an efefct, this might have an effect
(speculation I'm afraid).
>
> Is the filesystem the same?
> How big is the resulting dataset?
>
> (sorry for all the questions!)
>
> tdbloader2 works differently from tdbloader even during the data phase. It
seems like it is the B+trees slowing down, there is only one in tdbloader2
phase one, but two in tdbloader phase one.  That might explain the roughly
80 -> 150million (or x2).
>
>        Andy
>
> On 15/06/11 16:23, Richard Francis wrote:
>>
>> Hi,
>>
>> I'm using two identical machines in ec2 running tdbloader on centos
(CentOS
>> release 5 (Final)) and Ubuntu 11.04 (natty)
>>
>> I've observed an issue where Centos will run happily at a consistent
speed
>> and complete a load of 650million triples in around 12 hours, whereas the
>> load on Ubuntu, after just 15million triples tails off and runs at an
ever
>> increasing slower interval.
>>
>> On initial observation of the Ubuntu machine I noticed that the flush-202
>> process was running quite high, also running iostat showed that io was
the
>> real bottle neck - with the ubuntu machine showing a constant use of the
>> disk for both reads and writes (the centos machine had periods of no
usage
>> followed by periods of writes). This led me to investigate how memory was
>> being used by the Ubuntu machine - and a few blog posts / tutorials later
I
>> found a couple of settings to tweak - the first I tried
>> was dirty_writeback_centisecs - setting this to 0 had an immediate
positive
>> effect on the load that I was performing - but after some more testing I
>> found that the problem was just put back to around 80million triples
before
>> I saw a drop off on performance.
>>
>> This led me investigate whether there was the same issue with tdbloader2
-
>>  From my observations I got the same problem - but this time around 150m
>> triples.
>>
>> Again - I focused on "dirty" settings - and this time tweaking
dirty_bytes
>> = 30000000000 and dirty_background_bytes = 15000000000 saw a massive
>> performance increase and for the vast part of add phase of the tdbloader
it
>> kept up with the centos machine.
>>
>> Finally, last night I stopped all loads, and raced the centos machine and
>> the ubuntu machine - both have completed - but the Centos machine (around
12
>> hours) was still far quicker than the Ubuntu machine (20 hours).
>>
>> So my questions are, has anyone else observed this? - can anyone suggest
any
>> further improvements - or things to try? - what is the best OS to perform
a
>> tdbload on?
>>
>> Rich
>>
>>
>> Tests were performed on three different machines 1x Centos and 2 x Ubuntu
-
>> to rule out EC2 being a bottle neck - all were  (from
>> http://aws.amazon.com/ec2/instance-types/)
>>
>> High-Memory Double Extra Large Instance
>>
>> 34.2 GB of memory
>> 13 EC2 Compute Units (4 virtual cores with 3.25 EC2 Compute Units each)
>> 850 GB of instance storage
>> 64-bit platform
>> I/O Performance: High
>> API name: m2.2xlarge
>> All machines are configured with no swap
>>
>> Here's the summary from the only completed load on Ubuntu;
>>
>> ** Index SPO->OSP: 685,552,449 slots indexed in 18,337.75 seconds [Rate:
>> 37,384.76 per second]
>> -- Finish triples index phase
>> ** 685,552,449 triples indexed in 37,063.51 seconds [Rate: 18,496.69 per
>> second]
>> -- Finish triples load
>> ** Completed: 685,552,449 triples loaded in 78,626.27 seconds [Rate:
>> 8,719.13 per second]
>> -- Finish quads load
>>
>> Some resources I used;
>> http://www.westnet.com/~gsmith/content/linux-pdflush.htm
>> http://arighi.blogspot.com/2008/10/fine-grained-dirtyratio-and.html
>>

Reply via email to