Haaroon opened a new issue #13301:
URL: https://github.com/apache/pulsar/issues/13301
**Describe the bug**
When using the file connector, if specifying a directory, a file name and
setting `keepFile = true` the connector simply re-reads the file once it is
complete, this is regardless of whether it was modified or not. If setting
`keepFile = false` then it does not, as after the first read the connector
deletes the file.
**To Reproduce**
Steps to reproduce the behavior:
1. First run pulsar standalone, and create a local file reader connector,
using the following config (any.yaml) read the file. Note any.csv can be any
file
```yaml
configs:
inputDirectory: "/tmp/"
recurse: false
keepFile: true
fileFilter: "any.csv"
```
2. Run the connector via the following command
```bash
pulsar-admin sources localrun \
--archive connectors/pulsar-io-file-2.8.1.nar \
--name spout \
--destination-topic-name file-raw \
--source-config-file any.yaml
```
3. Check the stats via the stats msgIn from the topic stats
```
./pulsar-admin topics stats lotr-file-raw | grep msgIn
```
You will see that the msgIn exceeds the number of lines in the csv file, if
you now repeat this experiment but change the yaml to `keepFile: false` you
will see that the msgIn will not exceed the number of lines in the csv file.
**Expected behavior**
The file connector should not repeatedly read the file, it should read the
file once.
If the file has changed, then it should either read and continue where it
left off. But this should be a config set.
**Desktop (please complete the following information):**
- OS: Mac OSX latest, latest pulsar 2.8.1.
This issue is still in the latest build of pulsar because the source code
for the file connector has not changed in years.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]