> I'm not sure if we want to check the BOM during split generation.
> 
> 1. This might become a bottleneck, since splits are generated by the 
> JobManager. OTOH, there is currently an effort to parallelize split 
> generation.
> 2. FileInputFormat is currently not handling any charset issues.
> 
> An alternative would be to check the BOM in `DelimitedInputFormat` when a 
> split is opened.

@fhueske Hi, fhueske, if you check the BOM in DelimitedInputFormat when opening 
the split, I think the following should be considered:
1. A file is split into different TaskManagers, then the BOM of the 
verification file is required on each TaskManager.

[ Full content available at: https://github.com/apache/flink/pull/6710 ]
This message was relayed via gitbox.apache.org for devnull@infra.apache.org

Reply via email to