-=| p...@cpan.org, 14.11.2019 09:51:20 +0100 |=- > On Wednesday 13 November 2019 20:37:06 Damyan Ivanov wrote: > > my($buffer, $string) = ("", ""); > > while (read($fh, $buffer, 256, length($buffer))) { > > $string .= decode($encoding, $buffer, Encode::FB_QUIET); > > # $buffer now contains the unprocessed partial character > > } > > This code is dangerous. It can enter into endless loop. Once you read > invalid UTF-8 sequence, above loop never finish. So if buffer input is > under user/attacker control you introduce DoS issues.
Sure. A check to prevent that would be in order. I must admit that I was very happy to find a solution to the problem that was even in the official documentation. > Instead of FB_QUIET, you should use Encode::STOP_AT_PARTIAL flag. This > is the flag which you want to use. Encode::decode stops decoding when > valid UTF-8 sequence is not complete and needs more bytes to read. And > by default invalid UTF-8 sequences are mapped to Unicode replacement > character. > > Btw, PerlIO::encoding uses also Encode::STOP_AT_PARTIAL flag to handle > this situation. > > PS: I know that Encode::STOP_AT_PARTIAL is undocumented, but it is only > because nobody found time to write documentation for it. It is part of > Encode API and ready to use... That would be https://rt.cpan.org/Public/Bug/Display.html?id=67065 (filed 8 years ago, still open).