[
https://issues.apache.org/jira/browse/FLINK-4329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15426400#comment-15426400
]
ASF GitHub Bot commented on FLINK-4329:
---------------------------------------
Github user rmetzger commented on a diff in the pull request:
https://github.com/apache/flink/pull/2350#discussion_r75303846
--- Diff:
flink-fs-tests/src/test/java/org/apache/flink/hdfstests/ContinuousFileMonitoringTest.java
---
@@ -106,6 +109,140 @@ public static void destroyHDFS() {
// TESTS
@Test
+ public void testFileReadingOperatorWithIngestionTime() throws Exception
{
+ Set<org.apache.hadoop.fs.Path> filesCreated = new HashSet<>();
+ Map<Integer, String> expectedFileContents = new HashMap<>();
+ for(int i = 0; i < NO_OF_FILES; i++) {
+ Tuple2<org.apache.hadoop.fs.Path, String> file =
fillWithData(hdfsURI, "file", i, "This is test line.");
+ filesCreated.add(file.f0);
+ expectedFileContents.put(i, file.f1);
+ }
+
+ TextInputFormat format = new TextInputFormat(new Path(hdfsURI));
+ TypeInformation<String> typeInfo =
TypeExtractor.getInputFormatTypes(format);
+
+ ContinuousFileReaderOperator<String, ?> reader = new
ContinuousFileReaderOperator<>(format);
+
+ StreamConfig streamConfig = new StreamConfig(new
Configuration());
+
streamConfig.setTimeCharacteristic(TimeCharacteristic.IngestionTime);
+
+ ExecutionConfig executionConfig = new ExecutionConfig();
+ executionConfig.setAutoWatermarkInterval(100);
+
+ TestTimeServiceProvider timeServiceProvider = new
TestTimeServiceProvider();
+ OneInputStreamOperatorTestHarness<FileInputSplit, String>
tester =
+ new OneInputStreamOperatorTestHarness<>(reader,
executionConfig, timeServiceProvider, streamConfig);
+
+ reader.setOutputType(typeInfo, new ExecutionConfig());
+ tester.open();
+
+ timeServiceProvider.setCurrentTime(0);
+
+ long elementTimestamp = 201;
+ timeServiceProvider.setCurrentTime(elementTimestamp);
+
+ // test that a watermark is actually emitted
+ Assert.assertTrue(tester.getOutput().size() == 1 &&
+ tester.getOutput().peek() instanceof Watermark &&
+ ((Watermark) tester.getOutput().peek()).getTimestamp()
== 200);
--- End diff --
You don't need to change it, but I think it's a good idea to test the
conditions independently. This allows you to see which condition was false,
based on the line number.
> Fix Streaming File Source Timestamps/Watermarks Handling
> --------------------------------------------------------
>
> Key: FLINK-4329
> URL: https://issues.apache.org/jira/browse/FLINK-4329
> Project: Flink
> Issue Type: Bug
> Components: Streaming Connectors
> Affects Versions: 1.1.0
> Reporter: Aljoscha Krettek
> Assignee: Kostas Kloudas
> Fix For: 1.1.1
>
>
> The {{ContinuousFileReaderOperator}} does not correctly deal with watermarks,
> i.e. they are just passed through. This means that when the
> {{ContinuousFileMonitoringFunction}} closes and emits a {{Long.MAX_VALUE}}
> that watermark can "overtake" the records that are to be emitted in the
> {{ContinuousFileReaderOperator}}. Together with the new "allowed lateness"
> setting in window operator this can lead to elements being dropped as late.
> Also, {{ContinuousFileReaderOperator}} does not correctly assign ingestion
> timestamps since it is not technically a source but looks like one to the
> user.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)