stevedlawrence opened a new pull request, #982: URL: https://github.com/apache/daffodil/pull/982
Currently, if the primitive type is an integer then text number parsing disallows parsing decimal points, even if the pattern contains a decimal point. Instead, when parsing integers, we should allow decimals as long the fractional part is zero. And when unparsing, we should unparse a decimal point with a zero fractional part according to the pattern. This changes the behavior so integer parsing uses the same DecimalFormat configuration as non-integer parsing (i.e. decimals are allowed), but we throw a parse error if the fractional part that was parsed is non-zero. This also means that unparsing integers now outputs decimal points according to the pattern. Additionally, if textNumberCheckPolicy is strict, we enable ICU setDecimalPatternMatchRequired to true so that we allow or disallow decimal points in the data depending on if the pattern does or does not have a decimal point. Note that lax parsing always allows decimal points regardless of the pattern. For this reason, we now always require the grouping/decimal separator DFDL properties in lax mode. One bug was discovered in ICU (ICU-22303) where if we require the decimal point due to strict mode enabled, then ICU never parses the infinity/NaN representation. A workaround is added to manually check for these representations until this bug is fixed. ICU unit tests are also added which should fail if ICU fixes this bug so we can remove this workaround. Make sure we always specify infinity and NaN representations from the DFDL schema for all primitives, not just for xs:double/xs:float. There is no way to disable infinity/NaN ICU parsing, so when if we do not specify these values ICU just uses the locale values, which could lead to unwanted locale specific behavior. Related, this modifies NodeInfo types so that fromNumber fails for types that do not support infinity/NaN (i.e. everything except Double/Float) and creates a parse error. Modifies virtual decimal logic to ensure we handle cases for numbers that do not fit in a Long (should work) or contain decimal points (should be a parse error). Tests are updated so if they want to differentiate between int and decimal depending on if a decimal exists in the data, then they must specify a pattern with or without a decimal and enable strict mode--lax mode allows a decimal regardless of type so cannot differentiate the types. DAFFODIL-2158 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
