dan-s1 commented on PR #7194:
URL: https://github.com/apache/nifi/pull/7194#issuecomment-1527776319

   @exceptionfactory I was able to find the discrepancy and it is not what I 
thought. It is actually dependent on the java.util.Locale and not the operating 
system. In order to use the 
org.apache.nifi.util.SchemaInferenceUtil.getDataType method for inferring the 
correct data type, I needed to get the cell value as a string. To do that I 
used [ 
apache.poi.ss.usermodel.DataFormatter](apache.poi.ss.usermodel.DataFormatter) 
which I hoped would take care of any formatting and give the best inference 
possible especially if the cell value was a timestamp or perhaps a big integer. 
It is only now I realize that class is heavily reliant on java.util.Locale 
which determine how numbers and dates are formatted in a given locale. Hence 
the difference across the CI builds which are running in different locales.  
The particular  field which has the differences is a field with a number in 
scientific notation. In FR it is represented for example as 9,8765E+08 while 
the other locales represent it as 9.
 8765E+08. The underlying NIFI code which parses numbers does not recognize 
commas hence it is inferring the field's data type as a string and not a float. 
I still believe the choice I made for the inference logic is correct as the POI 
handles all of the necessary formatting. Though it seems  I probably should use 
simpler data which would work across all locales. Please let me know how you 
would like me to proceed.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to