Re: [Discuss] Expression Language Options to Add Decimal Support

Joe Percivall Mon, 25 Apr 2016 07:17:04 -0700

Pierre and Andy,

How do you guys feel about option 3 as Mark has explained it? Keeping in mind 
that it would be backwards compatible. As it stands we have differing opinions, 
which is a good thing, but we need to reach an agreement in order to proceed.
 
Joe
- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: [email protected]

On Thursday, April 21, 2016 11:12 AM, Joe Percivall 
<[email protected]> wrote:
Mark,

After thinking through this a bit more I believe you're right. We can assume 
this logic for every operation:

Two whole numbers as input returns a whole number
A whole number and a decimal as input returns a decimal
Two decimals as input returns a Decimal

I believe, one problem with your explanation as written is "...infer the return 
type and allow the user to explicitly change it via toNumber() and toDecimal() 
calls if need be". More specifically you would need to call toNumber/toDecimal 
on the inputs in order to change the return type. You can't change the return 
type after the expression runs (without the potential loss of 
information/precision of converting between the two) because once the 
expression evaluator exits it needs to have already interpreted it as a long or 
double which would occur before the user is able to call toNumber/toDecimal. So 
the user would have to control the return type would by calling 
toNumber/toDecimal on the inputs. 

That leads to the one confusing bit about this approach, when a user attempts 
to do an operation on two whole numbers that they would expect to return a 
decimal (like 8/5 = 1.6). In order to get their expected output the user would 
have to know to explicitly convert at least one of them to a double by using 
the toDecimal operation. 

That being said it's probably that's worth it to keep down the verbosity and 
still have backwards compatibility. 

Joe
- - - - - - Joseph Percivall
linkedin.com/in/Percivall
e: [email protected]

On Thursday, April 21, 2016 10:20 AM, Mark Payne <[email protected]> wrote:

Joe,

I am definitely in favor of #3. I think if we perform some sort of binary 
operation on two numbers,
we should return a Decimal type if either of the operands is a Decimal and a 
Number (Long) type
if both operands are Longs. We would also need a toDecimal() method that would 
convert a Number
type ot a Decimal type and we would need to support calling toNumber() on a 
decimal as well.

Given this, though, I think it makes sense to infer the return type and allow 
the user to explicitly
change it via toNumber() and toDecimal() calls if need be. With the addition of 
these functions, I don't
think it would be 'shady' to interpret the types at all. We already do 
interpret types in many cases, for
instance when we have an expression such as ${num1:divide(${num2}) } -- in this 
case, num1 and num2
are attributes, so they are strings. We already implicitly convert them to 
numbers. Going forward, we should
convert them to either Number or Decimal type based on the regex that they 
match. I believe this also
addresses the concern of not being able to override. I don't think backward 
compatibility is an issue here
either, as we currently would just fail and throw an Exception if attempting to 
divide a string like "2.3"

Number 3 seems like a slam-dunk to me in terms of pros vs cons.

Thanks
-Mark

> On Apr 20, 2016, at 10:16 PM, Joe Percivall <[email protected]> 
> wrote:
> 
> Hello Dev list,
> I've been working on a couple improvements that deal with utilizing 
> expression language to do data analysis and this has exposed a couple issues 
> with the typing of numbers in Expression Language (EL). The improvements are 
> a continuation of the topic I presented on at the Apache NiFi MeetUp group in 
> MD[1]. 
> The primary issue I came across is that currently in EL all numbers 
> interpreted as longs[2], which only store whole numbers. This leads to 
> problems when trying to do things like dividing whole numbers or just trying 
> to add/subtract decimals. I am actually surprised that the request for 
> decimals this hasn't come up before. That being said, after some initial 
> discussion with Tony and Joe, I believe that there are four potential ways 
> forward.
> 1: Create a new EL type "decimal" backed by a double[3] and new methods to 
> support it ("add_decimal"): This allows the user to explicitly choose whether 
> or not they want to use decimal or whole numbers. It retains the simple 
> use-cases that use whole numbers while opening up the new use-cases using 
> decimals. One down side is that it is more verbose. It means adding a new 
> function for each math operation. This is backwards compatible.
> 
> 2: Back all numbers by doubles instead of longs: The easy to implement and 
> retains very concise methods ("add", "subtract", etc..). A few cons, doubles 
> have a lower precision than longs[4],  can lead to rounding errors[5] and 
> could be confusing for users who only work with whole numbers to suddenly 
> find decimals.This is not backwards compatible.
> 3: Create a new EL type "decimal" back by a double[3] and attempt to smartly 
> automatically cast depending on method/input/output: This would allow for the 
> positives of having decimals and whole numbers in addition to having concise 
> methods. The main cons being a much longer implementation time to do it right 
> and the "shadiness" of doing things automatically for the user. Also this 
> would mean the user wouldn't have the option to explicitly override  This is 
> not backwards compatible.
> 4: Create a new EL type "decimal" backed by a double[3] and overload the 
> existing methods with an additional parameter to specify return type to 
> support it: This would allow for the positives of having decimals and whole 
> numbers in addition to having concise method names but this may cause 
> confusion with less technical users who aren't used to specifying return 
> types. This is backwards compatible.
> 
> The options that are not backwards compatible would need to wait to be 
> implemented until 1.0.
> The current option I am leaning towards is number 1 due to the explicitness 
> and greater control it gives the user. While it is more verbose I think the 
> decimal vs whole number syntax will be easy for even non-technical users to 
> pick up. Also I currently have a PR for it up here[6].
> Any other ideas or suggestions are welcome!
> [1] http://www.meetup.com/ApacheNiFi/events/229158777/
> [2] https://docs.oracle.com/javase/7/docs/api/java/lang/Long.html
> [3] https://docs.oracle.com/javase/7/docs/api/java/lang/Double.html
> [4] https://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html
> [5] http://stackoverflow.com/questions/960072/rounding-errors
> [6] https://issues.apache.org/jira/browse/NIFI-1662
> Joe- - - - - - Joseph Percivalllinkedin.com/in/Percivalle: 
> [email protected]

Re: [Discuss] Expression Language Options to Add Decimal Support

Reply via email to