[
https://issues.apache.org/jira/browse/HDFS-8060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14394660#comment-14394660
]
Umesh Kacha commented on HDFS-8060:
-----------------------------------
Hi Chrish thanks for the prompt response. Let me explain my use case.
I have one Java application which collects data from database servers and data
collected is roughly 1 TB every day. Now I need to compress these data files.
And these data files are residing in directories as explained below
ABC --dir
2015-03-25 --dir
0-1 --dir
0-1.dat --actual data fiile
1-2 --dir
1-2.dat
DEF --dir
2015-03-25 --dir
0-1 --dir
0-1.dat --actual data fiile
1-2 --dir
1-2.dat
So if you see above structures there are hundreds of servers named ABC, DEF and
for each ABC,DEF I have business date and internally each business date
contains hourly dirs 0-1,1-2,23-24 and so on and finally these hourly dirs
contains data files.
Now I have two jobs running daily compress and weekly merge of these data
files. Daily compress I find each hourly data files using fs.globStatus()
pattern so its easy. But to merge I need to copy all these hourly files into
one dir using
copy(FileSystem srcFS, Path[] srcs, FileSystem dstFS, Path dst, boolean
deleteSource, boolean overwrite, Configuration conf)
and then finally use copyMerge now above copy is slow then merge is also slow
when I have tera bytes of files. Hope this makes you understand my use case
more.
> org.apache.hadoop.fs.FileUtil.copyMerge should merge all files in recursive
> sub directories
> -------------------------------------------------------------------------------------------
>
> Key: HDFS-8060
> URL: https://issues.apache.org/jira/browse/HDFS-8060
> Project: Hadoop HDFS
> Issue Type: New Feature
> Reporter: Umesh Kacha
>
> org.apache.hadoop.fs.FileUtil.copyMerge does not find all the files
> recursively in sub directories. I am ready to push the code for the same.
> This is my first JIRA so dont know much the process. Please validate I feel
> this feature is very helpful. Since copyMerge does not support recursive
> finding in sub directories I need to copy files from thousands of directories
> first and then move into one directory and give that directory to copyMerge.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)