With explanations from Mark, option 3 is fine with me especially as it is backward compatible.
Pierre 2016-04-25 16:14 GMT+02:00 Joe Percivall <[email protected]>: > Pierre and Andy, > > How do you guys feel about option 3 as Mark has explained it? Keeping in > mind that it would be backwards compatible. As it stands we have differing > opinions, which is a good thing, but we need to reach an agreement in order > to proceed. > > Joe > - - - - - - > Joseph Percivall > linkedin.com/in/Percivall > e: [email protected] > > > > > On Thursday, April 21, 2016 11:12 AM, Joe Percivall > <[email protected]> wrote: > Mark, > > > After thinking through this a bit more I believe you're right. We can > assume this logic for every operation: > > Two whole numbers as input returns a whole number > A whole number and a decimal as input returns a decimal > Two decimals as input returns a Decimal > > I believe, one problem with your explanation as written is "...infer the > return type and allow the user to explicitly change it via toNumber() and > toDecimal() calls if need be". More specifically you would need to call > toNumber/toDecimal on the inputs in order to change the return type. You > can't change the return type after the expression runs (without the > potential loss of information/precision of converting between the two) > because once the expression evaluator exits it needs to have already > interpreted it as a long or double which would occur before the user is > able to call toNumber/toDecimal. So the user would have to control the > return type would by calling toNumber/toDecimal on the inputs. > > That leads to the one confusing bit about this approach, when a user > attempts to do an operation on two whole numbers that they would expect to > return a decimal (like 8/5 = 1.6). In order to get their expected output > the user would have to know to explicitly convert at least one of them to a > double by using the toDecimal operation. > > That being said it's probably that's worth it to keep down the verbosity > and still have backwards compatibility. > > > Joe > - - - - - - Joseph Percivall > linkedin.com/in/Percivall > e: [email protected] > > > > > On Thursday, April 21, 2016 10:20 AM, Mark Payne <[email protected]> > wrote: > > > > Joe, > > I am definitely in favor of #3. I think if we perform some sort of binary > operation on two numbers, > we should return a Decimal type if either of the operands is a Decimal and > a Number (Long) type > if both operands are Longs. We would also need a toDecimal() method that > would convert a Number > type ot a Decimal type and we would need to support calling toNumber() on > a decimal as well. > > Given this, though, I think it makes sense to infer the return type and > allow the user to explicitly > change it via toNumber() and toDecimal() calls if need be. With the > addition of these functions, I don't > think it would be 'shady' to interpret the types at all. We already do > interpret types in many cases, for > instance when we have an expression such as ${num1:divide(${num2}) } -- in > this case, num1 and num2 > are attributes, so they are strings. We already implicitly convert them to > numbers. Going forward, we should > convert them to either Number or Decimal type based on the regex that they > match. I believe this also > addresses the concern of not being able to override. I don't think > backward compatibility is an issue here > either, as we currently would just fail and throw an Exception if > attempting to divide a string like "2.3" > > Number 3 seems like a slam-dunk to me in terms of pros vs cons. > > Thanks > -Mark > > > > > On Apr 20, 2016, at 10:16 PM, Joe Percivall > <[email protected]> wrote: > > > > Hello Dev list, > > I've been working on a couple improvements that deal with utilizing > expression language to do data analysis and this has exposed a couple > issues with the typing of numbers in Expression Language (EL). The > improvements are a continuation of the topic I presented on at the Apache > NiFi MeetUp group in MD[1]. > > The primary issue I came across is that currently in EL all numbers > interpreted as longs[2], which only store whole numbers. This leads to > problems when trying to do things like dividing whole numbers or just > trying to add/subtract decimals. I am actually surprised that the request > for decimals this hasn't come up before. That being said, after some > initial discussion with Tony and Joe, I believe that there are four > potential ways forward. > > 1: Create a new EL type "decimal" backed by a double[3] and new methods > to support it ("add_decimal"): This allows the user to explicitly choose > whether or not they want to use decimal or whole numbers. It retains the > simple use-cases that use whole numbers while opening up the new use-cases > using decimals. One down side is that it is more verbose. It means adding a > new function for each math operation. This is backwards compatible. > > > > 2: Back all numbers by doubles instead of longs: The easy to implement > and retains very concise methods ("add", "subtract", etc..). A few cons, > doubles have a lower precision than longs[4], can lead to rounding > errors[5] and could be confusing for users who only work with whole numbers > to suddenly find decimals.This is not backwards compatible. > > 3: Create a new EL type "decimal" back by a double[3] and attempt to > smartly automatically cast depending on method/input/output: This would > allow for the positives of having decimals and whole numbers in addition to > having concise methods. The main cons being a much longer implementation > time to do it right and the "shadiness" of doing things automatically for > the user. Also this would mean the user wouldn't have the option to > explicitly override This is not backwards compatible. > > 4: Create a new EL type "decimal" backed by a double[3] and overload the > existing methods with an additional parameter to specify return type to > support it: This would allow for the positives of having decimals and whole > numbers in addition to having concise method names but this may cause > confusion with less technical users who aren't used to specifying return > types. This is backwards compatible. > > > > The options that are not backwards compatible would need to wait to be > implemented until 1.0. > > The current option I am leaning towards is number 1 due to the > explicitness and greater control it gives the user. While it is more > verbose I think the decimal vs whole number syntax will be easy for even > non-technical users to pick up. Also I currently have a PR for it up > here[6]. > > Any other ideas or suggestions are welcome! > > [1] http://www.meetup.com/ApacheNiFi/events/229158777/ > > [2] https://docs.oracle.com/javase/7/docs/api/java/lang/Long.html > > [3] https://docs.oracle.com/javase/7/docs/api/java/lang/Double.html > > [4] > https://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html > > [5] http://stackoverflow.com/questions/960072/rounding-errors > > [6] https://issues.apache.org/jira/browse/NIFI-1662 > > Joe- - - - - - Joseph Percivalllinkedin.com/in/Percivalle: > [email protected] >
