Github user joewitt commented on a diff in the pull request:
https://github.com/apache/nifi/pull/980#discussion_r78762052
--- Diff:
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/TailFile.java
---
@@ -117,31 +173,78 @@
.allowableValues(LOCATION_LOCAL, LOCATION_REMOTE)
.defaultValue(LOCATION_LOCAL.getValue())
.build();
+
static final PropertyDescriptor START_POSITION = new
PropertyDescriptor.Builder()
.name("Initial Start Position")
- .description("When the Processor first begins to tail data,
this property specifies where the Processor should begin reading data. Once
data has been ingested from the file, "
+ .description("When the Processor first begins to tail data,
this property specifies where the Processor should begin reading data. Once
data has been ingested from a file, "
+ "the Processor will continue from the last point
from which it has received data.")
.allowableValues(START_BEGINNING_OF_TIME, START_CURRENT_FILE,
START_CURRENT_TIME)
.defaultValue(START_CURRENT_FILE.getValue())
.required(true)
.build();
+ static final PropertyDescriptor RECURSIVE = new
PropertyDescriptor.Builder()
+ .name("tailfile-recursive-lookup")
+ .displayName("Recursive lookup")
+ .description("When using Multiple files mode, this property
defines if files must be listed recursively or not"
+ + " in the base directory.")
+ .allowableValues("true", "false")
+ .defaultValue("true")
+ .required(true)
+ .build();
+
+ static final PropertyDescriptor ROLLING_STRATEGY = new
PropertyDescriptor.Builder()
+ .name("tailfile-rolling-strategy")
+ .displayName("Rolling Strategy")
+ .description("Specifies if the files to tail have a fixed name
or not.")
+ .required(true)
+ .allowableValues(FIXED_NAME, CHANGING_NAME)
+ .defaultValue(FIXED_NAME.getValue())
+ .build();
+
+ static final PropertyDescriptor LOOKUP_FREQUENCY = new
PropertyDescriptor.Builder()
+ .name("tailfile-lookup-frequency")
+ .displayName("Lookup frequency")
+ .description("Only used in Multiple files mode and Changing
name rolling strategy, it specifies the minimum "
+ + "duration the processor will wait before listing
again the files to tail.")
+ .required(false)
+ .addValidator(StandardValidators.TIME_PERIOD_VALIDATOR)
+ .defaultValue("10 minutes")
--- End diff --
There appear to be two concerns here.
1) How often to look for new (not currently watched/tailed files)
2) At what point to consider a file fully consumed and no longer needing to
be actively watched/tailed.
There should be a property for each concern then. For (1) a rather low
value on the order of seconds to minutes as a default sounds reasonable. For
(2) a higher default value on the order of minutes to hours sounds reasonable.
In either case, the description of the property should clearly call out what it
means and the impact of the settings being too low or too high for a given
situation so users can decide whether they should specify an alternative for
their case or not.
In no case should either of these be 'infinite' and we must ensure we limit
how many things we track at once as it becomes a resource concern. If this is
already accounted for then great.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---