> I'm not sure if we want to check the BOM during split generation. > > 1. This might become a bottleneck, since splits are generated by the > JobManager. OTOH, there is currently an effort to parallelize split > generation. > 2. FileInputFormat is currently not handling any charset issues. > > An alternative would be to check the BOM in `DelimitedInputFormat` when a > split is opened.
@fhueske Hi, fhueske, if you check the BOM in DelimitedInputFormat when opening the split, I think the following should be considered: 1. A file is split into different TaskManagers, then the BOM of the verification file is required on each TaskManager. [ Full content available at: https://github.com/apache/flink/pull/6710 ] This message was relayed via gitbox.apache.org for devnull@infra.apache.org