I imported 9b rows in 5 days or so, a few minor crashes, average speed between 50-200 k ops/sec. The client needs some love to make it more efficient on grouping commits during bulk upload.
On Jun 27, 2009 4:02 PM, "Andrew Purtell" <[email protected]> wrote: Test: - Latest trunk. - Config modified only with a store file split threshold of 1GB - 4 node testbed: 1) namenode, datanode, hmaster, heritrix, jobtracker 2) datanode, regionserver, heritrix, tasktracker, mapper (2) 3) datanode, regionserver, heritrix, tasktracker, mapper (2) 4) datanode, regionserver, heritrix, tasktracker, mapper (2) - 100 heritrix threads - 4 hosts, 25 threads each - feeding in ~5MB/sec average new edits - 2 mappers x 3 hosts processing new edits and writing back serialized/compressed Documents - 3K average transactions/sec reported by master - 'hadoop balancer -threshold 0.1' - 1 hour test run Result: Passed with no incidents! - Andy
