[ https://issues.apache.org/jira/browse/HADOOP-5861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12713768#action_12713768 ]
Joydeep Sen Sarma commented on HADOOP-5861: ------------------------------------------- looks good to me .. > s3n files are not getting split by default > ------------------------------------------- > > Key: HADOOP-5861 > URL: https://issues.apache.org/jira/browse/HADOOP-5861 > Project: Hadoop Core > Issue Type: Bug > Components: fs/s3 > Affects Versions: 0.19.1 > Environment: ec2 > Reporter: Joydeep Sen Sarma > Assignee: Tom White > Attachments: hadoop-5861.patch > > > running with stock ec2 scripts against hadoop-19 - i tried to run a job > against a directory with 4 text files - each about 2G in size. These were not > split (only 4 mappers were run). > The reason seems to have two parts - primarily that S3N files report a block > size of 5G. This causes FileInputFormat.getSplits to fall back on goal size > (which is totalsize/conf.get("mapred.map.tasks")).Goal Size in this case was > 4G - hence the files were not split. This is not an issue with other file > systems since the block size reported is much smaller and the splits get > based on block size (not goal size). > can we make the S3N files report a more reasonable block size? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.