My 10x was very rough. I based it on:
a) you want a few files per map task b) you want a map task per core I tend to use quad core machines and so I used 2 x 8 = 10 (roughly). On EC2, you don't have multi-core machines (I think) so you might be fine with 2-4 files per CPU. -----Original Message----- From: C G [mailto:[EMAIL PROTECTED] Sent: Fri 8/31/2007 11:21 AM To: hadoop-user@lucene.apache.org Subject: RE: Compression using Hadoop... > Ted, from what you are saying I should be using at least 80 files given the > cluster size, and I should modify the loader to be aware > of the number of nodes and split accordingly. Do you concur?