[ 
https://issues.apache.org/jira/browse/HDFS-14788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17372761#comment-17372761
 ] 

Hongbing Wang commented on HDFS-14788:
--------------------------------------

Is there a plan to filter files by modtime? In the scenario of incremental data 
synchronization, if files in certain time windows can be specified, efficiency 
can be greatly improved.

> Use dynamic regex filter to ignore copy of source files in Distcp
> -----------------------------------------------------------------
>
>                 Key: HDFS-14788
>                 URL: https://issues.apache.org/jira/browse/HDFS-14788
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: distcp
>    Affects Versions: 3.2.1
>            Reporter: Mukund Thakur
>            Assignee: Mukund Thakur
>            Priority: Major
>             Fix For: 3.3.0
>
>
> There is a feature in Distcp where we can ignore specific files to get copied 
> to the destination. This is currently based on a filter regex which is read 
> from a specific file. The process of creating different regex file for 
> different distcp jobs seems like a tedious task. What we are proposing is to 
> expose a regex_filter parameter which can be set during Distcp job creation 
> and use this filter in a new implementation CopyFilter class. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to