https://issues.apache.org/bugzilla/show_bug.cgi?id=48292
--- Comment #8 from Petr.Udalau <[email protected]> 2009-12-18 06:36:34 UTC --- Josh, We appreciate your time and effort to integrate our stuff. >Thanks for all your work so far but I'm afraid it's going to take a few more >iterations before we are able to commit this patch. We understand it and ready to continue our work to match your requirements. >The tests needed to be converted to junit 3.8 because that's what POI uses >(changing junit versions is probably something we'd decide to do project wide). > The test classes were re-named to follow POI conventions, though the names >could still be more descriptive. >POI expects tests to be silent (not write to std-out or any log) when >successful. >Test code should not throw exceptions to be caught by other test code during >successful tests. The only exceptions that should be thrown are ones >deliberately caused in the production code being tested. >One test method for every example spreadsheet cell is probably too many. Some >simplification was done, but consider using techniques like those in >TestFormulasFromSpreadsheet and TestIndexFunctionFromSpreadsheet etc. We reworked our tests to fit your requirements. >General coding/formatting guidelines: > - please use unix line endings (LF instead of CRLF) > - don't mix indent style (tabs or spaces) in the same file > - always use {} for the bodies of if/else for/while statements > - don't assign to parameters > - turn on IDE warnings for unnecessary code We are in process of improving our code. >I made significant simplifications to ArrayEval (while keeping all your junit >tests working). >I got rid of ArrayEval.illegalForAggregation because there is a much simpler >way to get #N/A out of formulas with array size errors: The #N/A value >originates in ArrayFormulaEvaluatorHelper.getArrayValue() (new method), and >just flows up the evaluation stack through standard techniques. >The method ArrayEval.arrayAsArea() is also not needed (due to introduction of >TwoDEval). TwoDEval simplifies a lot of the code you needed to touch in >existing function implementations. The above is reasonable improvement. Thanks. >There are also some problems in your interpretation of how array formulas work. > I am pretty sure that at least one of the test cases is asserting the wrong >result (testOffsetArray() - ConstantArray.xls C149). I have added a new test >class TestArrayEvaluationExtra which shows what the correct behaviour should >be. The new failing test cases are currently disabled. While investigating, I >have found out that array eval elements cannot be cell references or areas. So >in this example, OFFSET cannot return an array of cell references. I've >disabled ArrayEval from doing this, and made other fixes to keep all test cases >working. The solution will probably involve getting the RVA types of the >tokens correct during the formula parsing phase. Read the OOO documentation >for an introduction to this. Unfortunately I think that information is >incomplete, but it is the best resource I know of. We will probably have to >discover all the missing details for ourselves. The issue you have raised is really exists. We spent quite a time researching it. 1. Behavior of function chain based on array of references (could be return by function offset(), index() etc.) is different in Excel 2003 and Excel 2007 (I mean engines, not formats). For example =SUM(SUM((OFFSET(A149:B150,{1,0},{1,0},{1,1},{1,1}))) returns 5 in Excel 2003 and #N/A in Excel 2007. In excel 2003 =average((OFFSET(A149:B150,{1,0},{1,0},{1,1},{1,1})) returns 4 but =sum(average((OFFSET(A149:B150,{1,0},{1,0},{1,1},{1,1}))) returns 2.5. We think that reason is in inconsistency of excel concepts. We decided to stick to Excel 2007 logic. First of all we assume that offset() returns array of references – if create an array formula {=OFFSET(A149:B150,{1,0},{1,0},{1,1},{1,1})} on two cells, it would fill it with 4 and 1 correspondingly. The behavior we should match is the following one: a. =SUM(OFFSET(A149:B150,{1,0},{1,0},{1,1},{1,1})) ----> 4 b. =SUM(SUM(OFFSET(A149:B150,{1,0},{1,0},{1,1},{1,1}))) ----> #N/A! (It is very strange result, result of functions have to depend on only args) c. =SQRT(OFFSET(A149:B150,{1,0},{1,0},{1,1},{1,1})) ----> 2 d. =SUM(SQRT(OFFSET(A149:B150,{1,0},{1,0},{1,1},{1,1}))) -------> #Value! e. =SQRT(SUM(OFFSET(A149:B150,{1,0},{1,0},{1,1},{1,1}))) ---------> 2 f. =SUM(SQRT(SUM(OFFSET(A149:B150,{1,0},{1,0},{1,1},{1,1})))) --------> #N/A! 2. The logic above mathematically seems strange but need to emulate it. To do this we add componentError as element on ArrayEval which contains „future” error to be exposed by aggregation function used second time. 3. We consider ArrayEval as „universal” container which is transfered via evaluation stack and need to contain results of any types including references. Thus OFFSET(A149:B150,{1,0},{1,0},{1,1},{1,1})) returns array of references (ArrayEval contains AreaEvals) and {OFFSET(A149:B150,{1,0},{1,0},{1,1},{1,1}))} on two cells returns 4 and 1. We have removed your validation from ArrayEval. >I also found that using the "Evaluate Formula" button (available in Excel 2007) >helped me understand how some of the example formulas should be evaluated. Thanks for the hint >The short-circuit-if evaluation stuff is probably not needed. The first >parameter to IF has type 'V' and I have a feeling that any parameter of type >'V' should always have a scalar ValueEval passed at runtime. If you pass an >array for a parameter of type 'V' then the whole function gets invoked once for >each item in the array. You have already written some code (prepareArgsForLoop >etc) that does this. It's just a question of having it be invoked at the right >time. We need this change because IF first parameter is array. In this case IF optimization not always accepatble. We do iteration over array but in non-optimized manner. >There are a few other problems in the code: Don't instantiate ArrayEval until you have its contents calculated. (i.e. don't create half-initialised objects). We’ll look at it. > FunctionWithArraySupport.supportArray() looks like it is needed for >answering questions about function parameters which are probably resolvable by >the existing FunctionMetadata. Similar problems with ArrayMode - it's probably >providing an incomplete solution to something that can perhaps be solved better >using proper RVA token classification. Our investigation showed that RVA types from FunctionMetadata is not enough for decision if function accept array as argument or we need to iterate over it. V and A types are OK but problem is with R type. For example for IF function is 1 IF 2 3 R V R R But function logic requires second/third parameters needed to be iterated (as scalar values). The same is true for COLUMN, CHOOSE, VLOOKUP, HLOOKUP. Thus, for the moment we stick to our approach. There is some unnecessary code which is commented. This code is represents that specificaiotns differs from our interfaces. There is also problem with determining return type of formula for example for formulas: =SUM(IF(A67:B68>2;A67:B68;1)) =SUM(OFFSET(A149:B150,{1,0},{1,0},{1,1},{1,1})) OFFSET metadata 78 OFFSET 3 5 R R V V V V x According to the specification: arguments that marked "V" in specification represented like an array and "R"-arguments represented like an area eval. Return type of functions is "R" in specification but OFFSET returns single value for SUM(by preparing args for one itaration of loop) and IF returns array for SUM. So, we make one-week investigation and you can see result. Our patch is raw but set/remove functionality is done, there is some problems with evaluation but we can't find right specification how to evaluate formulas(and even Excel have problems with evaluations of array formulas) We will waiting for your response and your point of view on the problem of array formulas evaluation in Excel. We have to determine way of evaluation in POI: we have to imitate Excel calculations or find some rules of calculations of formulas(may be using RVA specification). Regards. -- Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
