On 07/20/2016 06:21 AM, Pádraig Brady wrote: > It's worth considering having a separate (already existing?) util > to fix data before processing. That could have options to: > drop invalid chars, replace with replacement char, > apply various http://unicode.org/reports/tr15/#Norm_Forms, > convert enclosed forms like ㊷ to 42 etc. > I.E. we should avoid complicating each util where possible, > and at least avoid having options on each util that could be > hoisted to a more general util like above. > > Silently dropping invalid characters probably isn't a great idea, > and warnings to stderr is a bit messy and could be seen to contradict > POSIX which suggests exiting with failure if anything output to stderr. > A compromise might be to just replace invalid chars with > the replacement character � and then include that in > normal character processing, to make issues in input apparent.
Since there are several plausible error-handling methods (silently discard invalid input, flag input as invalid with an error and no further output, convert invalid input into replacement character and proceed with output), all of which can be considered desirable in some circumstances, I wonder if we should give ALL utilities a common --encoding-error=POLICY option that allows runtime selection between the three policies, and/or an environment variable that selects the default policy in absence of a command line choice. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
signature.asc
Description: OpenPGP digital signature
