Re: [Discuss] Expression Language Options to Add Decimal Support

Joe Percivall Mon, 25 Apr 2016 19:40:06 -0700

Thanks for going through each of your concerns Andy and I will address each:



1: I agree that option 3 would be most concise and would address this loss of 
precision.
2: There are a couple points here:
        1. Using decimal points: currently decimal points are reserved 
characters in expression language[1] but I'm not sure that they are used 
anywhere (I only see two matches for "DOT", in the lexer and parser). I'm not 
entirely sure how prioritization would work when adding the Regex to match a 
decimal point for a double. In the PR I have open[2]  (uses option 1 and will 
be closed) I explicitly require quoting the decimal to allow easier parsing. 
Also of note, you can only pass a quoted String or attribute name as the 
expression value subject (the thing getting a method called on it). So 
“${1.2:toNumber()}” would not be valid but “${"1.2":toNumber()}” would be.
        2. Evaluation of String which would parse to a double: When you run 
"${“1-2”:replace(“-“, “.”)}” you merely create a String with the contents 
"1.2". This is because the type only changes after being run through a method 
which returns a specific type.
        3. Long Term users: I agree that long term users would need to adjust 
their thinking but for a different reason. They have only ever used/had access 
to using whole numbers. This change would require users to change their 
thinking to think of new use-cases. If they don't want to adjust then they can 
continue only using whole numbers the exact same way they always have.
        4: User acceptance: I agree that specifying input type to get a 
specific output type will be something new to non-dev users and would require 
good documentation with good examples. That being said I think the added 
learning curve is worth the added functionality. 

[1] 
https://github.com/apache/nifi/blob/32398693d5917e35d3a6cd1549edd795e16fa8ec/nifi-commons/nifi-expression-language/src/main/antlr3/org/apache/nifi/attribute/expression/language/antlr/AttributeExpressionLexer.g#L80-L80
[2] https://github.com/apache/nifi/pull/310

Joe

- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: [email protected]



On Monday, April 25, 2016 9:31 PM, Andy LoPresto <[email protected]> wrote:



Mark made some good points in his support of option 3. I’ll try to list some of 
the scenarios I’m worried about, and if we can address each, I’m fine with that 
choice. For the rest of this message, “user” refers to the person writing the 
EL expression. They may or may not be a developer, but have read our existing 
EL documentation. 

1. Loss of precision — currently there is an issue where division of Long 
values can result in a loss of precision. I have provided a failing test here 
[1]. I believe this would be resolved by any of the proposed solutions, but the 
auto-infer mechanism of option 3 would be the most concise. 
2. Auto-casting — I’m not sure how the “auto-interpretation” would occur. Would 
it only happen on the result of mathematical operations? Would 
“${1.2:toNumber()}” return “1” as a Long or “1.2” as a Decimal? What about 
“${“1-2”:replace(“-“, “.”)}”? Is this a String with contents “1.2” or a Number 
(either Long or double)? The current “single numeric type” language has (I 
imagine) conditioned users to associate any numerical value with the Number 
type. I believe splitting decimal values out to “Decimal” type is correct, but 
will require very clear documentation and probably a concerted education effort 
for long-time users. As Joe pointed out, if a user wanted a decimal result of 
division, they would need to specify a decimal dividend or divisor, rather than 
operate on the quotient. This is accepted practice by developers, but is 
unlikely to be familiar with non-developers (i.e. writing 
“${“10”:divide(“3”):toDecimal()}” would result in “3”; 
“${“10”:divide(“3”:toDecimal())}” or ${“10”:toDecimal():divide(“3”)}” are 
necessary to get a result of “3.33…”)

I think as long as the user experience and concise and explicit documentation 
is held in paramount, I am ok with option 3. I just see the most potential for 
user confusion there, especially after some recent mailing list questions about 
the expression language functions and documentation. 

[1] 
https://github.com/alopresto/nifi/blob/el_double/nifi-commons/nifi-expression-language/src/test/groovy/org/apache/nifi/attribute/expression/language/QueryGroovyTest.groovy#L108






Andy LoPresto
[email protected]
[email protected]
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

On Apr 25, 2016, at 7:22 AM, Pierre Villard <[email protected]> wrote:
>
>With explanations from Mark, option 3 is fine with me especially as it is
>backward compatible.
>
>Pierre
>
>2016-04-25 16:14 GMT+02:00 Joe Percivall <[email protected]>:
>
>
>Pierre and Andy,
>>
>>How do you guys feel about option 3 as Mark has explained it? Keeping in
>>mind that it would be backwards compatible. As it stands we have differing
>>opinions, which is a good thing, but we need to reach an agreement in order
>>to proceed.
>>
>>Joe
>>- - - - - -
>>Joseph Percivall
>>linkedin.com/in/Percivall
>>e: [email protected]
>>
>>
>>
>>
>>On Thursday, April 21, 2016 11:12 AM, Joe Percivall
>><[email protected]> wrote:
>>Mark,
>>
>>
>>After thinking through this a bit more I believe you're right. We can
>>assume this logic for every operation:
>>
>>Two whole numbers as input returns a whole number
>>A whole number and a decimal as input returns a decimal
>>Two decimals as input returns a Decimal
>>
>>I believe, one problem with your explanation as written is "...infer the
>>return type and allow the user to explicitly change it via toNumber() and
>>toDecimal() calls if need be". More specifically you would need to call
>>toNumber/toDecimal on the inputs in order to change the return type. You
>>can't change the return type after the expression runs (without the
>>potential loss of information/precision of converting between the two)
>>because once the expression evaluator exits it needs to have already
>>interpreted it as a long or double which would occur before the user is
>>able to call toNumber/toDecimal. So the user would have to control the
>>return type would by calling toNumber/toDecimal on the inputs.
>>
>>That leads to the one confusing bit about this approach, when a user
>>attempts to do an operation on two whole numbers that they would expect to
>>return a decimal (like 8/5 = 1.6). In order to get their expected output
>>the user would have to know to explicitly convert at least one of them to a
>>double by using the toDecimal operation.
>>
>>That being said it's probably that's worth it to keep down the verbosity
>>and still have backwards compatibility.
>>
>>
>>Joe
>>- - - - - - Joseph Percivall
>>linkedin.com/in/Percivall
>>e: [email protected]
>>
>>
>>
>>
>>On Thursday, April 21, 2016 10:20 AM, Mark Payne <[email protected]>
>>wrote:
>>
>>
>>
>>Joe,
>>
>>I am definitely in favor of #3. I think if we perform some sort of binary
>>operation on two numbers,
>>we should return a Decimal type if either of the operands is a Decimal and
>>a Number (Long) type
>>if both operands are Longs. We would also need a toDecimal() method that
>>would convert a Number
>>type ot a Decimal type and we would need to support calling toNumber() on
>>a decimal as well.
>>
>>Given this, though, I think it makes sense to infer the return type and
>>allow the user to explicitly
>>change it via toNumber() and toDecimal() calls if need be. With the
>>addition of these functions, I don't
>>think it would be 'shady' to interpret the types at all. We already do
>>interpret types in many cases, for
>>instance when we have an expression such as ${num1:divide(${num2}) } -- in
>>this case, num1 and num2
>>are attributes, so they are strings. We already implicitly convert them to
>>numbers. Going forward, we should
>>convert them to either Number or Decimal type based on the regex that they
>>match. I believe this also
>>addresses the concern of not being able to override. I don't think
>>backward compatibility is an issue here
>>either, as we currently would just fail and throw an Exception if
>>attempting to divide a string like "2.3"
>>
>>Number 3 seems like a slam-dunk to me in terms of pros vs cons.
>>
>>Thanks
>>-Mark
>>
>>
>>
>>
>>On Apr 20, 2016, at 10:16 PM, Joe Percivall
>>><[email protected]> wrote:
>>
>>
>>>Hello Dev list,
>>>I've been working on a couple improvements that deal with utilizing
>>>expression language to do data analysis and this has exposed a couple
>>issues with the typing of numbers in Expression Language (EL). The
>>improvements are a continuation of the topic I presented on at the Apache
>>NiFi MeetUp group in MD[1].
>>
>>The primary issue I came across is that currently in EL all numbers
>>>interpreted as longs[2], which only store whole numbers. This leads to
>>problems when trying to do things like dividing whole numbers or just
>>trying to add/subtract decimals. I am actually surprised that the request
>>for decimals this hasn't come up before. That being said, after some
>>initial discussion with Tony and Joe, I believe that there are four
>>potential ways forward.
>>
>>1: Create a new EL type "decimal" backed by a double[3] and new methods
>>>to support it ("add_decimal"): This allows the user to explicitly choose
>>whether or not they want to use decimal or whole numbers. It retains the
>>simple use-cases that use whole numbers while opening up the new use-cases
>>using decimals. One down side is that it is more verbose. It means adding a
>>new function for each math operation. This is backwards compatible.
>>
>>
>>>2: Back all numbers by doubles instead of longs: The easy to implement
>>>and retains very concise methods ("add", "subtract", etc..). A few cons,
>>doubles have a lower precision than longs[4],  can lead to rounding
>>errors[5] and could be confusing for users who only work with whole numbers
>>to suddenly find decimals.This is not backwards compatible.
>>
>>3: Create a new EL type "decimal" back by a double[3] and attempt to
>>>smartly automatically cast depending on method/input/output: This would
>>allow for the positives of having decimals and whole numbers in addition to
>>having concise methods. The main cons being a much longer implementation
>>time to do it right and the "shadiness" of doing things automatically for
>>the user. Also this would mean the user wouldn't have the option to
>>explicitly override  This is not backwards compatible.
>>
>>4: Create a new EL type "decimal" backed by a double[3] and overload the
>>>existing methods with an additional parameter to specify return type to
>>support it: This would allow for the positives of having decimals and whole
>>numbers in addition to having concise method names but this may cause
>>confusion with less technical users who aren't used to specifying return
>>types. This is backwards compatible.
>>
>>
>>>The options that are not backwards compatible would need to wait to be
>>>implemented until 1.0.
>>
>>The current option I am leaning towards is number 1 due to the
>>>explicitness and greater control it gives the user. While it is more
>>verbose I think the decimal vs whole number syntax will be easy for even
>>non-technical users to pick up. Also I currently have a PR for it up
>>here[6].
>>
>>Any other ideas or suggestions are welcome!
>>>[1] http://www.meetup.com/ApacheNiFi/events/229158777/
>>>[2] https://docs.oracle.com/javase/7/docs/api/java/lang/Long.html
>>>[3] https://docs.oracle.com/javase/7/docs/api/java/lang/Double.html
>>>[4]
>>>https://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html
>>
>>[5] http://stackoverflow.com/questions/960072/rounding-errors
>>>[6] https://issues.apache.org/jira/browse/NIFI-1662
>>>Joe- - - - - - Joseph Percivalllinkedin.com/in/Percivalle:
>>>[email protected]
>>
>>

Re: [Discuss] Expression Language Options to Add Decimal Support

Reply via email to