[
https://issues.apache.org/jira/browse/HDFS-8836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961710#comment-14961710
]
Akira AJISAKA commented on HDFS-8836:
-------------------------------------
Some comments from me.
{code}
+ skipEmptyFileDelimiter = cf.getOpt("skip-empty-file") ? true : false;
{code}
1. {{? true : false}} is redundant, can be removed.
{code}
if (skipEmptyFileDelimiter && src.stat.getLen() == 0) {
continue;
}
FSDataInputStream in = src.fs.open(src.path);
try {
IOUtils.copyBytes(in, out, getConf(), false);
if (delimiter != null) {
out.write(delimiter.getBytes("UTF-8"));
}
} finally {
in.close();
}
{code}
2. Can we skip opening empty file if the file length is zero as follows?
{code}
if (src.stat.getLen() != 0) {
try (FSDataInputStream in = src.fs.open(src.path)) {
IOUtils.copyBytes(in, out, getConf(), false);
writeDelimiter(out);
}
} else if (!skipEmptyFileDelimiter) {
writeDelimiter(out);
}
private void writeDelimiter(FSDataOutputStream out) {
...
}
{code}
{code:title=TestFsShellCopy#testCopyMerge}
// directory with 3 files, should skip subdir
{code}
3. An empty file is added, so there are 4 files.
> Skip newline on empty files with getMerge -nl
> ---------------------------------------------
>
> Key: HDFS-8836
> URL: https://issues.apache.org/jira/browse/HDFS-8836
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: hdfs-client
> Affects Versions: 2.6.0, 2.7.1
> Reporter: Jan Filipiak
> Assignee: Kanaka Kumar Avvaru
> Priority: Trivial
> Attachments: HDFS-8836-01.patch, HDFS-8836-02.patch,
> HDFS-8836-03.patch, HDFS-8836-04.patch, HDFS-8836-05.patch
>
>
> Hello everyone,
> I recently was in the need of using the new line option -nl with getMerge
> because the files I needed to merge simply didn't had one. I was merging all
> the files from one directory and unfortunately this directory also included
> empty files, which effectively led to multiple newlines append after some
> files. I needed to remove them manually afterwards.
> In this situation it is maybe good to have another argument that allows
> skipping empty files.
> Thing one could try to implement this feature:
> The call for IOUtils.copyBytes(in, out, getConf(), false); doesn't
> return the number of bytes copied which would be convenient as one could
> skip append the new line when 0 bytes where copied or one would check the
> file size before.
> I posted this Idea on the mailing list
> http://mail-archives.apache.org/mod_mbox/hadoop-user/201507.mbox/%3C55B25140.3060005%40trivago.com%3E
> but I didn't really get many responses, so I thought I my try this way.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)