Github user amberarrow commented on a diff in the pull request: https://github.com/apache/apex-malhar/pull/368#discussion_r74606819 --- Diff: docs/operators/fsInputOperator.md --- @@ -0,0 +1,101 @@ +File Input Operator +============= + +## Operator Objective +This operator scans a directory for files. Files are then read and split into tuples, which are emitted. The default implementation scans a single directory. The operator is fault tolerant. It tracks previously read files and current offset as part of checkpoint state. In case of failure the operator will skip files that were already processed and fast forward to the offset of the current file. Supports partitioning and changes to number of partitions. The directory scanner is responsible to only accept the files that belong to a partition. + +File Input Operator is **idempotent**, **fault-tolerant** and **partitionable**. + +## Operator Usecase +1. Read all files of a directory and then keep scanning it for newly added files. + +## Operator Information +1. Operator location: ***malhar-library*** +2. Available since: ***1.0.2*** +3. Operator state: ***Stable*** +3. Java Packages: + * Operator: ***[com.datatorrent.lib.io.fs.AbstractFileInputOperator](https://www.datatorrent.com/docs/apidocs/com/datatorrent/lib/io/fs/AbstractFileInputOperator.html)*** + +### AbstractFileInputOperator +This is the abstract implementation that serves as base class for scanning a directory for files and read the files one by one. This class doesnât have any ports. + + + +## Properties, Attributes and Ports +### <a name="props"></a>Properties of AbstractFileInputOperator +| **Property** | **Description** | **Type** | **Mandatory** | **Default Value** | +| -------- | ----------- | ---- | ------------------ | ------------- | +| *directory* | absolute path of directory to be scanned | String | Yes | N/A | +| *scanIntervalMillis* | Interval in milliseconds after which directory should be scanned for new files | int | No | 5000 | +| *emitBatchSize* | Number of tuples to emit in a batch | int | No | 1000 | +| *partitionCount* | Desired number of partitions count | int | No | 1 | +| *maxRetryCount* | Maximum number of times the operator will attempt to process a file | No | 5 | --- End diff -- Missing a column
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---