[jira] [Commented] (HDFS-17069) The documentation and implementation of "dfs.blocksize" are inconsistent.

Ayush Saxena (Jira) Fri, 14 Jul 2023 09:08:57 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-17069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17743210#comment-17743210
 ]


Ayush Saxena commented on HDFS-17069:
-------------------------------------

There is no such restriction that it can not be 1m, that exception is very 
indicative as well, there is another config
{code:java}
<property>
  <name>dfs.namenode.fs-limits.min-block-size</name>
  <value>1048576</value>
  <description>Minimum block size in bytes, enforced by the Namenode at create
      time. This prevents the accidental creation of files with tiny block
      sizes (and thus many blocks), which can degrade performance. Support 
multiple
      size unit suffix(case insensitive), as described in dfs.blocksize.
  </description>
</property> {code}
This at namenode sets a minimum block size a client can ask for, it is by 
default 1048576, if you set it to 0 you can have any value from the client side 
for block size. If you set it to 128mb, you can't have any block less than 
128mb as well.

 

Not a bug, nor do we need any doc improvements, resolving!!!

> The documentation and implementation of "dfs.blocksize" are inconsistent.
> -------------------------------------------------------------------------
>
>                 Key: HDFS-17069
>                 URL: https://issues.apache.org/jira/browse/HDFS-17069
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: dfs, documentation
>    Affects Versions: 3.3.6
>         Environment: Linux version 4.15.0-142-generic 
> (buildd@lgw01-amd64-039) (gcc version 5.4.0 20160609 (Ubuntu 
> 5.4.0-6ubuntu1~16.04.12))
> java version "1.8.0_162"
> Java(TM) SE Runtime Environment (build 1.8.0_162-b12)
> Java HotSpot(TM) 64-Bit Server VM (build 25.162-b12, mixed mode)
>            Reporter: ECFuzz
>            Priority: Major
>              Labels: pull-request-available
>
> My hadoop version is 3.3.6, and I use the Pseudo-Distributed Operation.
> core-site.xml like below.
> {code:java}
> <configuration>
>   <property>
>         <name>fs.defaultFS</name>
>         <value>hdfs://localhost:9000</value>
>     </property>
>     <property>
>         <name>hadoop.tmp.dir</name>
>         <value>/home/hadoop/Mutil_Component/tmp</value>
>     </property>
>    
> </configuration>{code}
> hdfs-site.xml like below.
> {code:java}
> <configuration>
>    <property>
>         <name>dfs.replication</name>
>         <value>1</value>
>     </property>
> <property>
>         <name>dfs.blocksize</name>
>         <value>128k</value>
>     </property>
>    
> </configuration>{code}
> And then format the namenode, and start the hdfs.
> {code:java}
> hadoop@hadoop-Standard-PC-i440FX-PIIX-1996:~/Mutil_Component/hadoop-3.3.6$ 
> bin/hdfs namenode -format
> xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx(many info)
> hadoop@hadoop-Standard-PC-i440FX-PIIX-1996:~/Mutil_Component/hadoop-3.3.6$ 
> sbin/start-dfs.sh
> Starting namenodes on [localhost]
> Starting datanodes
> Starting secondary namenodes [hadoop-Standard-PC-i440FX-PIIX-1996]{code}
> Finally, use dfs to put a file. Then I get the message which means 128k is 
> less than 1M.
>  
> {code:java}
> hadoop@hadoop-Standard-PC-i440FX-PIIX-1996:~/Mutil_Component/hadoop-3.3.6$ 
> bin/hdfs dfs -mkdir -p /user/hadoop
> hadoop@hadoop-Standard-PC-i440FX-PIIX-1996:~/Mutil_Component/hadoop-3.3.6$ 
> bin/hdfs dfs -mkdir input
> hadoop@hadoop-Standard-PC-i440FX-PIIX-1996:~/Mutil_Component/hadoop-3.3.6$ 
> bin/hdfs dfs -put etc/hadoop/hdfs-site.xml input
> put: Specified block size is less than configured minimum value 
> (dfs.namenode.fs-limits.min-block-size): 131072 < 1048576
> {code}
> But I find that in the document, dfs.blocksize can be set like 128k and other 
> values in hdfs-default.xml .
> {code:java}
> The default block size for new files, in bytes. You can use the following 
> suffix (case insensitive): k(kilo), m(mega), g(giga), t(tera), p(peta), 
> e(exa) to specify the size (such as 128k, 512m, 1g, etc.), Or provide 
> complete size in bytes (such as 134217728 for 128 MB).{code}
> So, should there be some issues with the documents here？Or should notice user 
> to set this configuration to be larger than 1M?
>  
> Additionally, I start the yarn and run the given mapreduce job.
> {code:java}
> hadoop@hadoop-Standard-PC-i440FX-PIIX-1996:~/Mutil_Component/hadoop-3.3.6$ 
> sbin/start-yarn.sh 
> hadoop@hadoop-Standard-PC-i440FX-PIIX-1996:~/Mutil_Component/hadoop-3.3.6$ 
> bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.6.jar 
> grep input output 'dfs[a-z.]+'{code}
>  And,  the shell will throw some exceptions like below.
> {code:java}
> 2023-07-12 15:12:29,964 INFO client.DefaultNoHARMFailoverProxyProvider: 
> Connecting to ResourceManager at /0.0.0.0:8032
> 2023-07-12 15:12:30,430 INFO mapreduce.JobResourceUploader: Disabling Erasure 
> Coding for path: 
> /tmp/hadoop-yarn/staging/hadoop/.staging/job_1689145947338_0001
> 2023-07-12 15:12:30,542 INFO mapreduce.JobSubmitter: Cleaning up the staging 
> area /tmp/hadoop-yarn/staging/hadoop/.staging/job_1689145947338_0001
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): Specified block 
> size is less than configured minimum value 
> (dfs.namenode.fs-limits.min-block-size): 131072 < 1048576
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2690)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2625)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:807)
>         at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:496)
>         at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:621)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:589)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:573)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1227)
>         at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1094)
>         at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1017)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3048)        
> at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1567)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1513)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1410)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:258)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:139)
>         at com.sun.proxy.$Proxy9.create(Unknown Source)
>         at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:383)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:433)
>         at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:166)
>         at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:158)
>         at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:96)
>         at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:362)
>         at com.sun.proxy.$Proxy10.create(Unknown Source)
>         at 
> org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:280)
>         at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1271)
>         at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1250)
>         at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1232)
>         at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1170)
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem$8.doCall(DistributedFileSystem.java:569)
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem$8.doCall(DistributedFileSystem.java:566)
>         at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:580)
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:507)
>         at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1233)
>         at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1210)
>         at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1091)
>         at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:489)
>         at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:430)
>         at 
> org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:2592)
>         at 
> org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:2558)
>         at 
> org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:2520)
>         at 
> org.apache.hadoop.mapreduce.JobResourceUploader.copyJar(JobResourceUploader.java:785)
>         at 
> org.apache.hadoop.mapreduce.JobResourceUploader.uploadJobJar(JobResourceUploader.java:451)
>         at 
> org.apache.hadoop.mapreduce.JobResourceUploader.uploadResourcesInternal(JobResourceUploader.java:211)
>         at 
> org.apache.hadoop.mapreduce.JobResourceUploader.uploadResources(JobResourceUploader.java:135)
>         at 
> org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:99)
>         at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:194)
>         at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1678)
>         at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1675)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>         at org.apache.hadoop.mapreduce.Job.submit(Job.java:1675)
>         at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1696)
>         at org.apache.hadoop.examples.Grep.run(Grep.java:78)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:82)
>         at org.apache.hadoop.examples.Grep.main(Grep.java:103)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at 
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
>         at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
>         at 
> org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at org.apache.hadoop.util.RunJar.run(RunJar.java:328)
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:241){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-17069) The documentation and implementation of "dfs.blocksize" are inconsistent.

Reply via email to