large dfs.block.size doesn't work
---------------------------------

                 Key: HADOOP-5495
                 URL: https://issues.apache.org/jira/browse/HADOOP-5495
             Project: Hadoop Core
          Issue Type: Bug
          Components: dfs
    Affects Versions: 0.19.1
         Environment: Linux ... 2.6.27.12-170.2.5.fc10.x86_64 #1 SMP Wed Jan 21 
01:33:24 EST 2009 x86_64 x86_64 x86_64 GNU/Linux
java version "1.6.0_12"
Java(TM) SE Runtime Environment (build 1.6.0_12-b04)
Java HotSpot(TM) 64-Bit Server VM (build 11.2-b01, mixed mode)

            Reporter: mike andrews
            Priority: Minor


my motivation for trying large dfs.block.size was to enforce locality for large 
blobs in the multi-gigabyte size range. these are files that must be processed 
in one step and not broken up into records as with typical map/reduce workflow.

i tried "-put" then "-cat" for a 1.6 gb file and it worked fine, but
when trying it on a 16.4 gb file ("bigfile.dat"), i get the following
errors (see below). i got this failure both times i tried it, each
with a fresh install of single-node 0.19.1. i set block size to
32 gb with larger buffer and checksum sizes in config (see below as
well)

---

<configuration>
<property>
 <name>dfs.block.size</name>
 <value>34359738368</value>
 <description>The default block size for new files.</description>
</property>
<property>
 <name>io.file.buffer.size</name>
 <value>65536</value>
 <description>The size of buffer for use in sequence files.
 The size of this buffer should probably be a multiple of hardware
 page size (4096 on Intel x86), and it determines how much data is
 buffered during read and write operations.</description>
</property>
<property>
 <name>io.bytes.per.checksum</name>
 <value>4096</value>
 <description>The number of bytes per checksum.  Must not be larger than
 io.file.buffer.size.</description>
</property><property>
   <name>fs.default.name</name>
   <value>hdfs://localhost:9000</value>
 </property>
 <property>
   <name>mapred.job.tracker</name>
   <value>localhost:9001</value>
 </property>
 <property>
   <name>dfs.replication</name>
   <value>1</value>
</property>
</configuration>


[...@... hadoop-0.19.1]$ bin/hadoop fs -put /tmp/bigfile.dat /
[...@... hadoop-0.19.1]$ bin/hadoop fs -cat /bigfile.dat | md5sum
09/03/14 15:52:34 WARN hdfs.DFSClient: Exception while reading from
blk_-4992364814640383286_1013 of /bigfile.dat from 127.0.0.1:50010:
java.io.IOException: BlockReader: error in packet header(chunkOffset :
415956992, dataLen : 41284, seqno : 0 (last: -1))
       at 
org.apache.hadoop.hdfs.DFSClient$BlockReader.readChunk(DFSClient.java:1186)
       at 
org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:238)
       at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:190)
       at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:159)
       at org.apache.hadoop.hdfs.DFSClient$BlockReader.read(DFSClient.java:1060)
       at 
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.readBuffer(DFSClient.java:1615)
       at 
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1665)
       at java.io.DataInputStream.read(DataInputStream.java:83)
       at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:53)
       at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85)
       at org.apache.hadoop.fs.FsShell.printToStdout(FsShell.java:120)
       at org.apache.hadoop.fs.FsShell.access$100(FsShell.java:49)
       at org.apache.hadoop.fs.FsShell$1.process(FsShell.java:351)
       at 
org.apache.hadoop.fs.FsShell$DelayedExceptionThrowing.globAndProcess(FsShell.java:1872)
       at org.apache.hadoop.fs.FsShell.cat(FsShell.java:345)
       at org.apache.hadoop.fs.FsShell.doall(FsShell.java:1519)
       at org.apache.hadoop.fs.FsShell.run(FsShell.java:1735)
       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
       at org.apache.hadoop.fs.FsShell.main(FsShell.java:1854)

09/03/14 15:52:34 INFO hdfs.DFSClient: Could not obtain block
blk_-4992364814640383286_1013 from any node:  java.io.IOException: No
live nodes contain current block
09/03/14 15:52:37 WARN hdfs.DFSClient: Exception while reading from
blk_-4992364814640383286_1013 of /bigfile.dat from 127.0.0.1:50010:
java.io.IOException: BlockReader: error in packet header(chunkOffset :
415956992, dataLen : 41284, seqno : 0 (last: -1))
       at 
org.apache.hadoop.hdfs.DFSClient$BlockReader.readChunk(DFSClient.java:1186)
       at 
org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:238)
       at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:190)
       at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:159)
       at org.apache.hadoop.hdfs.DFSClient$BlockReader.read(DFSClient.java:1060)
       at 
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.readBuffer(DFSClient.java:1615)
       at 
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1665)
       at java.io.DataInputStream.read(DataInputStream.java:83)
       at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:53)
       at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85)
       at org.apache.hadoop.fs.FsShell.printToStdout(FsShell.java:120)
       at org.apache.hadoop.fs.FsShell.access$100(FsShell.java:49)
       at org.apache.hadoop.fs.FsShell$1.process(FsShell.java:351)
       at 
org.apache.hadoop.fs.FsShell$DelayedExceptionThrowing.globAndProcess(FsShell.java:1872)
       at org.apache.hadoop.fs.FsShell.cat(FsShell.java:345)
       at org.apache.hadoop.fs.FsShell.doall(FsShell.java:1519)
       at org.apache.hadoop.fs.FsShell.run(FsShell.java:1735)
       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
       at org.apache.hadoop.fs.FsShell.main(FsShell.java:1854)

09/03/14 15:52:37 INFO hdfs.DFSClient: Could not obtain block
blk_-4992364814640383286_1013 from any node:  java.io.IOException: No
live nodes contain current block
09/03/14 15:52:40 WARN hdfs.DFSClient: DFS Read: java.io.IOException:
BlockReader: error in packet header(chunkOffset : 415956992, dataLen :
41284, seqno : 0 (last: -1))
       at 
org.apache.hadoop.hdfs.DFSClient$BlockReader.readChunk(DFSClient.java:1186)
       at 
org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:238)
       at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:190)
       at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:159)
       at org.apache.hadoop.hdfs.DFSClient$BlockReader.read(DFSClient.java:1060)
       at 
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.readBuffer(DFSClient.java:1615)
       at 
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1665)
       at java.io.DataInputStream.read(DataInputStream.java:83)
       at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:53)
       at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85)
       at org.apache.hadoop.fs.FsShell.printToStdout(FsShell.java:120)
       at org.apache.hadoop.fs.FsShell.access$100(FsShell.java:49)
       at org.apache.hadoop.fs.FsShell$1.process(FsShell.java:351)
       at 
org.apache.hadoop.fs.FsShell$DelayedExceptionThrowing.globAndProcess(FsShell.java:1872)
       at org.apache.hadoop.fs.FsShell.cat(FsShell.java:345)
       at org.apache.hadoop.fs.FsShell.doall(FsShell.java:1519)
       at org.apache.hadoop.fs.FsShell.run(FsShell.java:1735)
       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
       at org.apache.hadoop.fs.FsShell.main(FsShell.java:1854)

cat: BlockReader: error in packet header(chunkOffset : 415956992,
dataLen : 41284, seqno : 0 (last: -1))
ef8033a70b6691c2b99ad1c74583161a  -
[...@... hadoop-0.19.1]$



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to