If there is a hard requirement for input split being one block you could just make your input split fit a smaller block size.
Just saying, in case you can't overcome the 2G ceiling J Sent from my mobile. Please excuse the typos. On 2010-10-18, at 5:08 PM, "elton sky" <[email protected]> wrote: >> Why would you want to use a block size of > 2GB? > For keeping a maps input split in a single block~ > > On Tue, Oct 19, 2010 at 9:07 AM, Michael Segel > <[email protected]>wrote: > >> >> Ok, I'll bite. >> Why would you want to use a block size of > 2GB? >> >> >> >>> Date: Mon, 18 Oct 2010 21:33:34 +1100 >>> Subject: BUG: Anyone use block size more than 2GB before? >>> From: [email protected] >>> To: [email protected] >>> >>> Hello, >>> >>> In >>> >> hdfs.org.apache.hadoop.hdfs.DFSClient<eclipse-javadoc:%E2%98%82=HadoopSrcCode/src%3Chdfs.org.apache.hadoop.hdfs%7BDFSClient.java%E2%98%83DFSClient> >>> >> .DFSOutputStream<eclipse-javadoc:%E2%98%82=HadoopSrcCode/src%3Chdfs.org.apache.hadoop.hdfs%7BDFSClient.java%E2%98%83DFSClient%E2%98%83DFSOutputStream>.writeChunk(byte[] >>> b, int offset, int len, byte[] checksum) >>> The second last line: >>> >>> int psize = Math.min((int)(blockSize-bytesCurBlock), writePacketSize); >>> >>> When I use blockSize bigger than 2GB, which is out of the boundary of >>> integer something weird would happen. For example, for a 3GB block it >> will >>> create more than 2Million packets. >>> >>> Anyone noticed this before? >>> >>> Elton >> >>
