MartijnVisser opened a new pull request, #28565:
URL: https://github.com/apache/flink/pull/28565

   ## What is the purpose of the change
   
   This refactors the existing tests that still rely on the deprecated 
`StreamingFileSink` (relocated to the 
`org.apache.flink.streaming.api.functions.sink.filesystem.legacy` package) to 
use its replacement, `FileSink` (the `sink2` `Sink` API, attached via 
`sinkTo(...)`). This is the test-migration part of FLINK-30166, which unblocks 
the eventual removal of `StreamingFileSink` (FLINK-28641).
   
   A previous attempt (#21371) was reverted (#21628) because it (a) deleted a 
state-migration test that is still needed and (b) removed the 
`StreamingFileSink` option from the file-sink E2E program while the nightly 
script still requested it, causing FLINK-30605. This PR avoids both pitfalls.
   
   ## Brief change log
   
   - Format ITCases migrated from 
`StreamingFileSink.forBulkFormat/forRowFormat(...).build()` + `addSink` to 
`FileSink...build()` + `sinkTo` (Avro, Parquet, SequenceFile, Compress, ORC). 
Files whose class name embedded `StreamingFileSink` were renamed (e.g. 
`AvroStreamingFileSinkITCase` -> `AvroFileSinkITCase`).
   - Bulk-writer harness tests (`CompressWriterFactoryTest`, 
`OrcBulkWriterTest`, `OrcBulkRowDataWriterTest`, 
`OrcFileSystemITCase#initNestedTypesFile`) reworked to drive the 
`BulkWriter.Factory` directly via `factory.create(...)` + `finish()`, because 
`FileSink` is a `sink2` `Sink` and cannot be wrapped in the legacy `StreamSink` 
operator those tests used.
   - E2E: `FileSinkProgram` drops the `StreamingFileSink` branch and its 
`sinkToTest` parameter; `test_file_sink.sh` and `run-nightly-tests.sh` are 
updated in the same commit so the removed sink type is no longer requested. 
`StreamSQLTestProgram` switches to `FileSink`.
   - Dependencies: `flink-connector-files` added in test scope for 
flink-compress and flink-sequence-file (replacing the now-unused 
`flink-streaming-java` test-jar), and in compile scope for 
`flink-stream-sql-test` (since `FileSink` is not in `/lib` and must be shaded 
into the program jar).
   
   ## Scope (intentionally out of scope)
   
   The following keep `StreamingFileSink` on purpose and belong to follow-up 
tickets:
   
   - `FileSinkMigrationITCase` and 
`FileWriterBucketStateSerializerMigrationTest` verify state migration *from* 
`StreamingFileSink`; they remain until the class is removed (FLINK-28641).
   - `LocalStreamingFileSinkTest`, `BulkWriterTest`, `TestUtils` test the 
legacy class itself.
   - `CompactFileWriterTest`/`StreamingFileWriterTest` (flink-connector-files) 
and `HadoopPathBasedPartFileWriterITCase` (flink-hadoop-bulk) are blocked by 
FLINK-30627: the production `FileSystemTableSink`/`AbstractStreamingWriter` and 
`HadoopPathBasedBulkFormatBuilder` still consume/extend 
`StreamingFileSink.BucketsBuilder`.
   
   Coverage note: the four reworked harness tests no longer drive 
`snapshot()`/`notifyOfCompletedCheckpoint()` or bucket-directory creation. 
Those paths were incidental to writer-level format tests; the `FileSink` 
checkpoint/commit lifecycle remains covered by the job-level ITCases and 
`flink-connector-files`' own tests.
   
   ## Verifying this change
   
   This change is already covered by existing tests (the migrated tests 
themselves). Verified locally: all migrated modules compile, both E2E programs 
package (shade succeeds), and the migrated tests pass — 
`CompressWriterFactoryTest` (12), ORC writer tests (2), format ITCases for 
Avro/Parquet/SequenceFile/Compression (9), and ORC ITCases (43).
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): yes (test/build 
scope only: `flink-connector-files` added to a few format/E2E modules; unused 
`flink-streaming-java` test dependency removed)
     - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: no
     - The serializers: no
     - The runtime per-record code paths (performance sensitive): no
     - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
     - The S3 file system connector: no
   
   ## Documentation
   
     - Does this pull request introduce a new feature? no
   
   ---
   
   ##### Was generative AI tooling used to co-author this PR?
   
   - [X] Yes (Claude Code, Opus 4.8)
   
   Generated-by: Claude Code (Opus 4.8)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to