[
https://issues.apache.org/jira/browse/APEXMALHAR-2103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15309282#comment-15309282
]
ASF GitHub Bot commented on APEXMALHAR-2103:
--------------------------------------------
Github user chaithu14 commented on a diff in the pull request:
https://github.com/apache/incubator-apex-malhar/pull/300#discussion_r65303185
--- Diff:
library/src/main/java/com/datatorrent/lib/io/fs/FileSplitterInput.java ---
@@ -375,11 +374,18 @@ public void run()
lastScannedInfo = null;
numDiscoveredPerIteration = 0;
for (String afile : files) {
- String filePath = new File(afile).getAbsolutePath();
- LOG.debug("Scan started for input {}", filePath);
- Map<String, Long> lastModifiedTimesForInputDir;
- lastModifiedTimesForInputDir = referenceTimes.get(filePath);
- scan(new Path(afile), null, lastModifiedTimesForInputDir);
+ Path filePath = new Path(afile);
+ LOG.debug("Scan started for input {}", filePath.toString());
+ Map<String, Long> lastModifiedTimesForInputDir = null;
+ if (fs.exists(filePath)) {
+ FileStatus fileStatus = fs.getFileStatus(filePath);
+ if (fileStatus.isDirectory()) {
+ lastModifiedTimesForInputDir =
referenceTimes.get(fileStatus.getPath().toString());
+ } else {
+ lastModifiedTimesForInputDir =
referenceTimes.get(fileStatus.getPath().getParent().toString());
--- End diff --
@Priyanka: If we maintain 2 different keys, then fileSplitter will emit
multiple filemetadata's for /home/myDir/file1.txt. Is it expected behavior.
Please correct it, if I am wrong.
> scanner issues in FileSplitterInput class
> -----------------------------------------
>
> Key: APEXMALHAR-2103
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2103
> Project: Apache Apex Malhar
> Issue Type: Bug
> Reporter: Chaitanya
> Assignee: Chaitanya
>
> Issue: FileSplitter continuously emitting filemetadata even though there is
> a single file.
> Observation: For the same file, While updating and accessing the
> referenceTimes map in FIleSplitterInput and TimeBasedScanner, the Keys are
> different. Because of this, the oldestTimeModification is always null in
> TimeBasedScanner.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)