Re: Evaluating expressions outside a cell value context

Javen O'Neal Sun, 12 Feb 2017 15:23:22 -0800

I haven't read through all your changes on GitHub yet, but all the
changes so far look good. I have a few other suggestions to
deduplicate some code using SheetUtil and FormulaShifter, but those
changes can be made at a later date if needed.


You should be able to use git-svn to push your changes. Read through
and improve our git documentation [1] if necessary.

[1] https://poi.apache.org/guidelines.html#Approach+3+-+the+git+way

On Fri, Feb 10, 2017 at 12:10 AM, Greg Woolsey <[email protected]> wrote:
> Well, I couldn't stand the incomplete support, so now this supports
> evaluating rules for all the different types, including range aggregates
> like "greater than 2 standard deviations" and "top 10".  Still doesn't
> provide help assigning partitioning buckets for icon sets and colors, but
> everything else is working.
>
> I filed a big bug with Vaadin, listing 5 core design problems I've found
> with their Conditional Formatting implementation, and offering my
> replacement for their code that uses the new POI evaluator instead.  They
> bit, and are interested, but I won't make my first commit some behemoth
> that hasn't received any feedback.  I know there are conventions and ideas
> I've missed :)
>
>  I need both sets of changes for my day job, so I'm all-in on doing it
> right in both directions and facilitating the conversations.
>
> On Tue, Feb 7, 2017 at 11:50 PM Greg Woolsey <[email protected]> wrote:
>
>> Now this fork also contains ConditionalSpreadsheetEvaluator, and related
>> code.  The unit test is essentially a stub, but tests one basic style for
>> proof of concept.
>>
>> I've actually implemented a version of Vaadin Spreadsheet that uses this
>> new code to see how it performs, and I'm quite happy with both the improved
>> performance (~50% faster than theirs) and feature coverage/accuracy.  I've
>> found 5 major bugs so far in what they did, most likely the result of the
>> complexity of the document structure and the fact that several key pieces
>> of information where still buried in the implementation classes, and hadn't
>> been surfaced yet to the SS interfaces.  I've done that in this branch also.
>>
>> My code here is my own, I didn't like anything I saw elsewhere enough to
>> copy it :)
>>
>> Evaluation currently doesn't support range-based conditions, such as
>> TOP_10, DUPLICATE, etc.  Those don't seem like they'd be that bad to do, if
>> someone wants to take a stab at them.  I don't need them (yet), so they
>> just evaluate to "false" with a TODO comment for now.
>>
>> Likewise, there is no code to report which partition bucket a cell falls
>> into when the condition type is one of the partitioned styles, 2,3 or 4
>> value buckets, gradient fill, etc.  The fact that the rule matches (based
>> on range) is available, the caller would need to evaluate the rule type and
>> see what lies beneath.
>>
>> I assume interested parties will take a look as they have time and
>> inclination.  I'm sure there are areas to discuss, beyond where to put the
>> curly braces :)  I left some comments as to alternate strategies for some
>> areas, where I opted for less change to existing classes as a starting
>> point, even if it means a switch...case here or there when a new method
>> could be added to an Enum class instead.
>>
>> Hopefully the new methods on the SS interfaces are deemed minor - the
>> values were already there in most cases, at least on one side or the other
>> (HSSF/XSSF), with a static default to use for the other one per MS
>> documentation.
>>
>> Greg
>>
>> On Wed, Feb 1, 2017 at 5:38 PM Greg Woolsey <[email protected]>
>> wrote:
>>
>> Oh, the primary class is o.a.p.ss.formula.DataValidationEvaluator
>>
>> On Wed, Feb 1, 2017 at 5:37 PM Greg Woolsey <[email protected]>
>> wrote:
>>
>> My GitHub branch now contains Data Validation code and unit tests.  The
>> test file DataValidationEvaluations.xlsx contains a large set of validation
>> examples, including one formula example that applies to a range of cells
>> and uses a relative formula.  The evaluation code has corresponding logic
>> to offset the relative formula Ptgs from the top left of the region.
>>
>> Every test is labeled in the file with column A as a description, column B
>> as the cell with validation, and column C the expected result, TRUE =
>> valid, FALSE = invalid.
>>
>> The unit test compares the POI validation result with the expected column,
>> failing on boolean mismatches.
>>
>> Have not had time to run all tests yet, but this should only be code
>> additions, not modifications.  I'll run them soon.
>>
>> I'm sure there are code style discussions to be had - for example I
>> implemented some things as inner classes for now, but we may want them
>> top-level instead.
>>
>> Comments welcome, this is early code but is built on top of the SS
>> interfaces, so should be stable for HSSF and XSSF.
>>
>> Greg
>>
>> On Mon, Jan 30, 2017 at 9:55 AM Greg Woolsey <[email protected]>
>> wrote:
>>
>> Also, I just found this sample workbook
>> <http://download.microsoft.com/download/1/6/F/16F701E9-63BA-48D3-8B48-096F9288F443/AF010235700_en-us_cfsamples_af010235700.xlsx>
>>  in
>> the Excel online support docs.  If I have time to turn that into a unit
>> test, it's about as complete as we could want.  Some parts are lost saving
>> as HSSF, but we can then test that we evaluate what remains the same way as
>> newer Excel when opening a legacy formatted file.
>>
>> On Mon, Jan 30, 2017 at 9:38 AM Greg Woolsey <[email protected]>
>> wrote:
>>
>> Thanks, that makes sense wrt custom implementations of FormulaEvaluator -
>> I hadn't thought about anyone rolling their own, but it's an interface, so
>> quite possible.  Too bad we can't require Java 8 yet and use default
>> methods.
>>
>> I can work with the new *Evaluator class idea.  And the HSSF limitations
>> will just mean more unit tests :)  I have Excel 2016 available so I can
>> create test workbooks, save them as both XLSX and XLS, and compare
>> evaluations.  I can then write unit tests based on them that expect the
>> results seen in Excel.  That should give us reference points for confidence
>> in our replication of their logic, especially around rule priority/order
>> and XLS HSSF files.
>>
>> On Fri, Jan 27, 2017 at 11:05 PM Nick Burch <[email protected]> wrote:
>>
>> On Sat, 28 Jan 2017, Greg Woolsey wrote:
>> > As noted in one of the method JavaDocs, we also need to expose and make
>> use
>> > of the ConditionalFormattingRule "priority" attribute.  That's key to
>> > matching the right rule when more than one rule applies to a cell.  Only
>> > the first match in priority order is applied.
>>
>> Your slight challenge is that not all Conditional Formatting rules have a
>> priority... XLSX ones do, and newer XLS ones based on CFRule12Record (sid
>> = 0x087A) do, but the older XLS ones (CFRuleRecord / 0x01B1) don't. I'm
>> not sure what Excel does for those, but my hunch (based on our API) is
>> that it uses their order as a priority.
>>
>>
>> > I've created a fork in GitHub for this, and committed a stab at
>> > high-level API methods that could be added to the FormulaEvaluator
>> > interface:
>> >
>> https://github.com/WoozyG/poi/commit/d44fee7bd03ed450af589467ec90e2581b9f2b16$
>>
>> FormulaEvaluator is an interface, which we have 4 implementations of in
>> our codebabse, and I'd guess that other complex users of POI will have
>> dozens more. I'm not sure, therefore, that we want to be putting all of
>> the CF and DV logic methods on there, especially as it'll be common to all
>> implementations
>>
>> The HSSF classes for CF all use org.apache.poi.ss.formula.Formula which is
>> PTG based. The HSSF classes for DV seem to store the raw PTGs.
>>
>> If we added two new SS usermodel classes, eg
>> ConditionalFormattingEvaluator and DataValidationEvaluator, these could be
>> classes (not interfaces) with your proposed new methods on. They could
>> hold the logic (once) for all formats (as it's basically the same on all)
>> for priority, checking etc
>>
>> Doing that would also mean that "our" new classes could call out to our
>> existing low-level ones to evaluate formulas. That would mean we wouldn't
>> have to make a breaking change to the FormulaEvaluator interface too
>>
>> Might that work for you?
>>
>> > No implementations have been done yet, and the Vaadin comments indicate
>> > HSSF doesn't parse conditional formatting properly or something, and
>> can't
>> > be evaluated correctly currently.  I don't know exactly what they found
>> > wrong, and it's rather annoying they didn't file any bugs.
>>
>> I think that comment is out of date, from before the CF work in 3.13
>>
>> Nick
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Evaluating expressions outside a cell value context

Reply via email to