[
https://issues.apache.org/jira/browse/HDFS-8836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14805243#comment-14805243
]
Akira AJISAKA commented on HDFS-8836:
-------------------------------------
I'm thinking the new option is not reasonable. For HDFS, empty files should be
deleted because empty files need no extra storage but extra NN heap. Therefore
it's better to remove empty files before merging them.
Even though you cannot delete empty files for some reasons, I prefer using find
command such as "hadoop fs -find <dir> -type f AND -depth 1 AND (NOT -size 0) |
xargs hadoop fs -getmerge -nl" rather than adding the new option.
Unfortunately, find command for HDFS is still in development, so we cannot use
-type, -depth, or -size option.
> Skip newline on empty files with getMerge -nl
> ---------------------------------------------
>
> Key: HDFS-8836
> URL: https://issues.apache.org/jira/browse/HDFS-8836
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: hdfs-client
> Affects Versions: 2.6.0, 2.7.1
> Reporter: Jan Filipiak
> Assignee: Kanaka Kumar Avvaru
> Priority: Trivial
> Attachments: HDFS-8836-01.patch, HDFS-8836-02.patch,
> HDFS-8836-03.patch, HDFS-8836-04.patch, HDFS-8836-05.patch
>
>
> Hello everyone,
> I recently was in the need of using the new line option -nl with getMerge
> because the files I needed to merge simply didn't had one. I was merging all
> the files from one directory and unfortunately this directory also included
> empty files, which effectively led to multiple newlines append after some
> files. I needed to remove them manually afterwards.
> In this situation it is maybe good to have another argument that allows
> skipping empty files.
> Thing one could try to implement this feature:
> The call for IOUtils.copyBytes(in, out, getConf(), false); doesn't
> return the number of bytes copied which would be convenient as one could
> skip append the new line when 0 bytes where copied or one would check the
> file size before.
> I posted this Idea on the mailing list
> http://mail-archives.apache.org/mod_mbox/hadoop-user/201507.mbox/%3C55B25140.3060005%40trivago.com%3E
> but I didn't really get many responses, so I thought I my try this way.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)