[ 
https://issues.apache.org/jira/browse/NIFI-1170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15490689#comment-15490689
 ] 

ASF GitHub Bot commented on NIFI-1170:
--------------------------------------

Github user trixpan commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/980#discussion_r78770355
  
    --- Diff: 
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/TailFile.java
 ---
    @@ -117,31 +173,78 @@
                 .allowableValues(LOCATION_LOCAL, LOCATION_REMOTE)
                 .defaultValue(LOCATION_LOCAL.getValue())
                 .build();
    +
         static final PropertyDescriptor START_POSITION = new 
PropertyDescriptor.Builder()
                 .name("Initial Start Position")
    -            .description("When the Processor first begins to tail data, 
this property specifies where the Processor should begin reading data. Once 
data has been ingested from the file, "
    +            .description("When the Processor first begins to tail data, 
this property specifies where the Processor should begin reading data. Once 
data has been ingested from a file, "
                         + "the Processor will continue from the last point 
from which it has received data.")
                 .allowableValues(START_BEGINNING_OF_TIME, START_CURRENT_FILE, 
START_CURRENT_TIME)
                 .defaultValue(START_CURRENT_FILE.getValue())
                 .required(true)
                 .build();
     
    +    static final PropertyDescriptor RECURSIVE = new 
PropertyDescriptor.Builder()
    +            .name("tailfile-recursive-lookup")
    +            .displayName("Recursive lookup")
    +            .description("When using Multiple files mode, this property 
defines if files must be listed recursively or not"
    +                    + " in the base directory.")
    +            .allowableValues("true", "false")
    +            .defaultValue("true")
    +            .required(true)
    +            .build();
    +
    +    static final PropertyDescriptor ROLLING_STRATEGY = new 
PropertyDescriptor.Builder()
    +            .name("tailfile-rolling-strategy")
    +            .displayName("Rolling Strategy")
    +            .description("Specifies if the files to tail have a fixed name 
or not.")
    +            .required(true)
    +            .allowableValues(FIXED_NAME, CHANGING_NAME)
    +            .defaultValue(FIXED_NAME.getValue())
    +            .build();
    +
    +    static final PropertyDescriptor LOOKUP_FREQUENCY = new 
PropertyDescriptor.Builder()
    +            .name("tailfile-lookup-frequency")
    +            .displayName("Lookup frequency")
    +            .description("Only used in Multiple files mode and Changing 
name rolling strategy, it specifies the minimum "
    +                    + "duration the processor will wait before listing 
again the files to tail.")
    +            .required(false)
    +            .addValidator(StandardValidators.TIME_PERIOD_VALIDATOR)
    +            .defaultValue("10 minutes")
    --- End diff --
    
    @joewitt 
    
    I think @pvillard31  has a point when he says the status of a file should 
ALWAYS be tracked unless:
    
    1. overwritten / reset by the user (causing data duplication).
    2. too old to be relevant (removed automatically)
    
    Under this arrangement, the two timers make sense:
    
    1 - Maximum age of file  - if file is older than this date it won't be 
tailed. (_this happens to be very similar to Heka's approach as well_)
    
    2 - How frequently to harvest for new files - self explanatory
    2b - if new file is found tail.  If file is pre-existent and is older than 
max age remove status;
    
    In addition, we could consider what flume-ng taildir called an idle 
timeout, 
    
    `idleTimeout - Time (ms) to close inactive files. If the closed file is 
appended new lines to, this source will automatically re-open it.`
    
    These are files that are younger than maximum age, but largely stagnated. 
We would keep their status (until expiry) but they would be closed and only 
re-opened if the file size increased (or other tail conditions were be 
triggered).
    
    flume-ng tried to deal with resource waste by using and increasing delay to 
poll the idle files. The higher the number of polls without new data, the 
longer it would take before a new retry. Not sure if this is something we would 
like to do but would also help.
    
    
    Inevitably, most teams using date based naming conventions do that to 
prevent performing truncation of a file when logrotate runs and I suspect we 
should simply let the user know that having too many files in the same folder, 
matching the same URL would cause impact to performance and that compressing 
them so they don't match the file regex, or moving them to other directories in 
order to minimise resource waste.
    
    Hope this makes sense


> TailFile "File to Tail" property should support Wildcards
> ---------------------------------------------------------
>
>                 Key: NIFI-1170
>                 URL: https://issues.apache.org/jira/browse/NIFI-1170
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Core Framework
>    Affects Versions: 0.4.0
>            Reporter: Andre
>
> Because of challenges around log rotation of high volume syslog and app 
> producers, it is customary to logging platform developers to promote file 
> variables based file names such as DynaFiles (rsyslog), Macros(syslog-ng)as 
> alternatives to getting SIGHUPs being sent to the syslog daemon upon every 
> file rotation.
> (To certain extent, used even NiFi's has similar patterns, like for example, 
> when one uses Expression Language to set PutHDFS destination file).
> The current TailFile strategy suggests rotation patterns like:
> {code}
> log_folder/app.log
> log_folder/app.log.1
> log_folder/app.log.2
> log_folder/app.log.3
> {code}
> It is possible to fool the system to accept wildcards by simply using a 
> strategy like:
> {code}
> log_folder/test1
> log_folder/server1
> log_folder/server2
> log_folder/server3
> {code}
> And configure *Rolling Filename Pattern* to * but it feels like a hack, 
> rather than catering for an ever increasingly prevalent use case 
> (DynaFile/macros/etc).
> It would be great if instead, TailFile had the ability to use wildcards on 
> File to Tail property



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to