Joe,

I am definitely in favor of #3. I think if we perform some sort of binary 
operation on two numbers,
we should return a Decimal type if either of the operands is a Decimal and a 
Number (Long) type
if both operands are Longs. We would also need a toDecimal() method that would 
convert a Number
type ot a Decimal type and we would need to support calling toNumber() on a 
decimal as well.

Given this, though, I think it makes sense to infer the return type and allow 
the user to explicitly
change it via toNumber() and toDecimal() calls if need be. With the addition of 
these functions, I don't
think it would be 'shady' to interpret the types at all. We already do 
interpret types in many cases, for
instance when we have an expression such as ${num1:divide(${num2}) } -- in this 
case, num1 and num2
are attributes, so they are strings. We already implicitly convert them to 
numbers. Going forward, we should
convert them to either Number or Decimal type based on the regex that they 
match. I believe this also
addresses the concern of not being able to override. I don't think backward 
compatibility is an issue here
either, as we currently would just fail and throw an Exception if attempting to 
divide a string like "2.3"

Number 3 seems like a slam-dunk to me in terms of pros vs cons.

Thanks
-Mark


> On Apr 20, 2016, at 10:16 PM, Joe Percivall <[email protected]> 
> wrote:
> 
> Hello Dev list,
> I've been working on a couple improvements that deal with utilizing 
> expression language to do data analysis and this has exposed a couple issues 
> with the typing of numbers in Expression Language (EL). The improvements are 
> a continuation of the topic I presented on at the Apache NiFi MeetUp group in 
> MD[1]. 
> The primary issue I came across is that currently in EL all numbers 
> interpreted as longs[2], which only store whole numbers. This leads to 
> problems when trying to do things like dividing whole numbers or just trying 
> to add/subtract decimals. I am actually surprised that the request for 
> decimals this hasn't come up before. That being said, after some initial 
> discussion with Tony and Joe, I believe that there are four potential ways 
> forward.
> 1: Create a new EL type "decimal" backed by a double[3] and new methods to 
> support it ("add_decimal"): This allows the user to explicitly choose whether 
> or not they want to use decimal or whole numbers. It retains the simple 
> use-cases that use whole numbers while opening up the new use-cases using 
> decimals. One down side is that it is more verbose. It means adding a new 
> function for each math operation. This is backwards compatible.
> 
> 2: Back all numbers by doubles instead of longs: The easy to implement and 
> retains very concise methods ("add", "subtract", etc..). A few cons, doubles 
> have a lower precision than longs[4],  can lead to rounding errors[5] and 
> could be confusing for users who only work with whole numbers to suddenly 
> find decimals.This is not backwards compatible.
> 3: Create a new EL type "decimal" back by a double[3] and attempt to smartly 
> automatically cast depending on method/input/output: This would allow for the 
> positives of having decimals and whole numbers in addition to having concise 
> methods. The main cons being a much longer implementation time to do it right 
> and the "shadiness" of doing things automatically for the user. Also this 
> would mean the user wouldn't have the option to explicitly override  This is 
> not backwards compatible.
> 4: Create a new EL type "decimal" backed by a double[3] and overload the 
> existing methods with an additional parameter to specify return type to 
> support it: This would allow for the positives of having decimals and whole 
> numbers in addition to having concise method names but this may cause 
> confusion with less technical users who aren't used to specifying return 
> types. This is backwards compatible.
> 
> The options that are not backwards compatible would need to wait to be 
> implemented until 1.0.
> The current option I am leaning towards is number 1 due to the explicitness 
> and greater control it gives the user. While it is more verbose I think the 
> decimal vs whole number syntax will be easy for even non-technical users to 
> pick up. Also I currently have a PR for it up here[6].
> Any other ideas or suggestions are welcome!
> [1] http://www.meetup.com/ApacheNiFi/events/229158777/
> [2] https://docs.oracle.com/javase/7/docs/api/java/lang/Long.html
> [3] https://docs.oracle.com/javase/7/docs/api/java/lang/Double.html
> [4] https://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html
> [5] http://stackoverflow.com/questions/960072/rounding-errors
> [6] https://issues.apache.org/jira/browse/NIFI-1662
> Joe- - - - - - Joseph Percivalllinkedin.com/in/Percivalle: 
> [email protected]

Reply via email to