Pierre and Andy, How do you guys feel about option 3 as Mark has explained it? Keeping in mind that it would be backwards compatible. As it stands we have differing opinions, which is a good thing, but we need to reach an agreement in order to proceed. Joe - - - - - - Joseph Percivall linkedin.com/in/Percivall e: [email protected]
On Thursday, April 21, 2016 11:12 AM, Joe Percivall <[email protected]> wrote: Mark, After thinking through this a bit more I believe you're right. We can assume this logic for every operation: Two whole numbers as input returns a whole number A whole number and a decimal as input returns a decimal Two decimals as input returns a Decimal I believe, one problem with your explanation as written is "...infer the return type and allow the user to explicitly change it via toNumber() and toDecimal() calls if need be". More specifically you would need to call toNumber/toDecimal on the inputs in order to change the return type. You can't change the return type after the expression runs (without the potential loss of information/precision of converting between the two) because once the expression evaluator exits it needs to have already interpreted it as a long or double which would occur before the user is able to call toNumber/toDecimal. So the user would have to control the return type would by calling toNumber/toDecimal on the inputs. That leads to the one confusing bit about this approach, when a user attempts to do an operation on two whole numbers that they would expect to return a decimal (like 8/5 = 1.6). In order to get their expected output the user would have to know to explicitly convert at least one of them to a double by using the toDecimal operation. That being said it's probably that's worth it to keep down the verbosity and still have backwards compatibility. Joe - - - - - - Joseph Percivall linkedin.com/in/Percivall e: [email protected] On Thursday, April 21, 2016 10:20 AM, Mark Payne <[email protected]> wrote: Joe, I am definitely in favor of #3. I think if we perform some sort of binary operation on two numbers, we should return a Decimal type if either of the operands is a Decimal and a Number (Long) type if both operands are Longs. We would also need a toDecimal() method that would convert a Number type ot a Decimal type and we would need to support calling toNumber() on a decimal as well. Given this, though, I think it makes sense to infer the return type and allow the user to explicitly change it via toNumber() and toDecimal() calls if need be. With the addition of these functions, I don't think it would be 'shady' to interpret the types at all. We already do interpret types in many cases, for instance when we have an expression such as ${num1:divide(${num2}) } -- in this case, num1 and num2 are attributes, so they are strings. We already implicitly convert them to numbers. Going forward, we should convert them to either Number or Decimal type based on the regex that they match. I believe this also addresses the concern of not being able to override. I don't think backward compatibility is an issue here either, as we currently would just fail and throw an Exception if attempting to divide a string like "2.3" Number 3 seems like a slam-dunk to me in terms of pros vs cons. Thanks -Mark > On Apr 20, 2016, at 10:16 PM, Joe Percivall <[email protected]> > wrote: > > Hello Dev list, > I've been working on a couple improvements that deal with utilizing > expression language to do data analysis and this has exposed a couple issues > with the typing of numbers in Expression Language (EL). The improvements are > a continuation of the topic I presented on at the Apache NiFi MeetUp group in > MD[1]. > The primary issue I came across is that currently in EL all numbers > interpreted as longs[2], which only store whole numbers. This leads to > problems when trying to do things like dividing whole numbers or just trying > to add/subtract decimals. I am actually surprised that the request for > decimals this hasn't come up before. That being said, after some initial > discussion with Tony and Joe, I believe that there are four potential ways > forward. > 1: Create a new EL type "decimal" backed by a double[3] and new methods to > support it ("add_decimal"): This allows the user to explicitly choose whether > or not they want to use decimal or whole numbers. It retains the simple > use-cases that use whole numbers while opening up the new use-cases using > decimals. One down side is that it is more verbose. It means adding a new > function for each math operation. This is backwards compatible. > > 2: Back all numbers by doubles instead of longs: The easy to implement and > retains very concise methods ("add", "subtract", etc..). A few cons, doubles > have a lower precision than longs[4], can lead to rounding errors[5] and > could be confusing for users who only work with whole numbers to suddenly > find decimals.This is not backwards compatible. > 3: Create a new EL type "decimal" back by a double[3] and attempt to smartly > automatically cast depending on method/input/output: This would allow for the > positives of having decimals and whole numbers in addition to having concise > methods. The main cons being a much longer implementation time to do it right > and the "shadiness" of doing things automatically for the user. Also this > would mean the user wouldn't have the option to explicitly override This is > not backwards compatible. > 4: Create a new EL type "decimal" backed by a double[3] and overload the > existing methods with an additional parameter to specify return type to > support it: This would allow for the positives of having decimals and whole > numbers in addition to having concise method names but this may cause > confusion with less technical users who aren't used to specifying return > types. This is backwards compatible. > > The options that are not backwards compatible would need to wait to be > implemented until 1.0. > The current option I am leaning towards is number 1 due to the explicitness > and greater control it gives the user. While it is more verbose I think the > decimal vs whole number syntax will be easy for even non-technical users to > pick up. Also I currently have a PR for it up here[6]. > Any other ideas or suggestions are welcome! > [1] http://www.meetup.com/ApacheNiFi/events/229158777/ > [2] https://docs.oracle.com/javase/7/docs/api/java/lang/Long.html > [3] https://docs.oracle.com/javase/7/docs/api/java/lang/Double.html > [4] https://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html > [5] http://stackoverflow.com/questions/960072/rounding-errors > [6] https://issues.apache.org/jira/browse/NIFI-1662 > Joe- - - - - - Joseph Percivalllinkedin.com/in/Percivalle: > [email protected]
