dan-s1 commented on PR #7194: URL: https://github.com/apache/nifi/pull/7194#issuecomment-1527776319
@exceptionfactory I was able to find the discrepancy and it is not what I thought. It is actually dependent on the java.util.Locale and not the operating system. In order to use the org.apache.nifi.util.SchemaInferenceUtil.getDataType method for inferring the correct data type, I needed to get the cell value as a string. To do that I used [ apache.poi.ss.usermodel.DataFormatter](apache.poi.ss.usermodel.DataFormatter) which I hoped would take care of any formatting and give the best inference possible especially if the cell value was a timestamp or perhaps a big integer. It is only now I realize that class is heavily reliant on java.util.Locale which determine how numbers and dates are formatted in a given locale. Hence the difference across the CI builds which are running in different locales. The particular field which has the differences is a field with a number in scientific notation. In FR it is represented for example as 9,8765E+08 while the other locales represent it as 9. 8765E+08. The underlying NIFI code which parses numbers does not recognize commas hence it is inferring the field's data type as a string and not a float. I still believe the choice I made for the inference logic is correct as the POI handles all of the necessary formatting. Though it seems I probably should use simpler data which would work across all locales. Please let me know how you would like me to proceed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
