Now this fork also contains ConditionalSpreadsheetEvaluator, and related
code.  The unit test is essentially a stub, but tests one basic style for
proof of concept.

I've actually implemented a version of Vaadin Spreadsheet that uses this
new code to see how it performs, and I'm quite happy with both the improved
performance (~50% faster than theirs) and feature coverage/accuracy.  I've
found 5 major bugs so far in what they did, most likely the result of the
complexity of the document structure and the fact that several key pieces
of information where still buried in the implementation classes, and hadn't
been surfaced yet to the SS interfaces.  I've done that in this branch also.

My code here is my own, I didn't like anything I saw elsewhere enough to
copy it :)

Evaluation currently doesn't support range-based conditions, such as
TOP_10, DUPLICATE, etc.  Those don't seem like they'd be that bad to do, if
someone wants to take a stab at them.  I don't need them (yet), so they
just evaluate to "false" with a TODO comment for now.

Likewise, there is no code to report which partition bucket a cell falls
into when the condition type is one of the partitioned styles, 2,3 or 4
value buckets, gradient fill, etc.  The fact that the rule matches (based
on range) is available, the caller would need to evaluate the rule type and
see what lies beneath.

I assume interested parties will take a look as they have time and
inclination.  I'm sure there are areas to discuss, beyond where to put the
curly braces :)  I left some comments as to alternate strategies for some
areas, where I opted for less change to existing classes as a starting
point, even if it means a switch...case here or there when a new method
could be added to an Enum class instead.

Hopefully the new methods on the SS interfaces are deemed minor - the
values were already there in most cases, at least on one side or the other
(HSSF/XSSF), with a static default to use for the other one per MS
documentation.

Greg

On Wed, Feb 1, 2017 at 5:38 PM Greg Woolsey <[email protected]> wrote:

> Oh, the primary class is o.a.p.ss.formula.DataValidationEvaluator
>
> On Wed, Feb 1, 2017 at 5:37 PM Greg Woolsey <[email protected]>
> wrote:
>
> My GitHub branch now contains Data Validation code and unit tests.  The
> test file DataValidationEvaluations.xlsx contains a large set of validation
> examples, including one formula example that applies to a range of cells
> and uses a relative formula.  The evaluation code has corresponding logic
> to offset the relative formula Ptgs from the top left of the region.
>
> Every test is labeled in the file with column A as a description, column B
> as the cell with validation, and column C the expected result, TRUE =
> valid, FALSE = invalid.
>
> The unit test compares the POI validation result with the expected column,
> failing on boolean mismatches.
>
> Have not had time to run all tests yet, but this should only be code
> additions, not modifications.  I'll run them soon.
>
> I'm sure there are code style discussions to be had - for example I
> implemented some things as inner classes for now, but we may want them
> top-level instead.
>
> Comments welcome, this is early code but is built on top of the SS
> interfaces, so should be stable for HSSF and XSSF.
>
> Greg
>
> On Mon, Jan 30, 2017 at 9:55 AM Greg Woolsey <[email protected]>
> wrote:
>
> Also, I just found this sample workbook
> <http://download.microsoft.com/download/1/6/F/16F701E9-63BA-48D3-8B48-096F9288F443/AF010235700_en-us_cfsamples_af010235700.xlsx>
>  in
> the Excel online support docs.  If I have time to turn that into a unit
> test, it's about as complete as we could want.  Some parts are lost saving
> as HSSF, but we can then test that we evaluate what remains the same way as
> newer Excel when opening a legacy formatted file.
>
> On Mon, Jan 30, 2017 at 9:38 AM Greg Woolsey <[email protected]>
> wrote:
>
> Thanks, that makes sense wrt custom implementations of FormulaEvaluator -
> I hadn't thought about anyone rolling their own, but it's an interface, so
> quite possible.  Too bad we can't require Java 8 yet and use default
> methods.
>
> I can work with the new *Evaluator class idea.  And the HSSF limitations
> will just mean more unit tests :)  I have Excel 2016 available so I can
> create test workbooks, save them as both XLSX and XLS, and compare
> evaluations.  I can then write unit tests based on them that expect the
> results seen in Excel.  That should give us reference points for confidence
> in our replication of their logic, especially around rule priority/order
> and XLS HSSF files.
>
> On Fri, Jan 27, 2017 at 11:05 PM Nick Burch <[email protected]> wrote:
>
> On Sat, 28 Jan 2017, Greg Woolsey wrote:
> > As noted in one of the method JavaDocs, we also need to expose and make
> use
> > of the ConditionalFormattingRule "priority" attribute.  That's key to
> > matching the right rule when more than one rule applies to a cell.  Only
> > the first match in priority order is applied.
>
> Your slight challenge is that not all Conditional Formatting rules have a
> priority... XLSX ones do, and newer XLS ones based on CFRule12Record (sid
> = 0x087A) do, but the older XLS ones (CFRuleRecord / 0x01B1) don't. I'm
> not sure what Excel does for those, but my hunch (based on our API) is
> that it uses their order as a priority.
>
>
> > I've created a fork in GitHub for this, and committed a stab at
> > high-level API methods that could be added to the FormulaEvaluator
> > interface:
> >
> https://github.com/WoozyG/poi/commit/d44fee7bd03ed450af589467ec90e2581b9f2b16$
>
> FormulaEvaluator is an interface, which we have 4 implementations of in
> our codebabse, and I'd guess that other complex users of POI will have
> dozens more. I'm not sure, therefore, that we want to be putting all of
> the CF and DV logic methods on there, especially as it'll be common to all
> implementations
>
> The HSSF classes for CF all use org.apache.poi.ss.formula.Formula which is
> PTG based. The HSSF classes for DV seem to store the raw PTGs.
>
> If we added two new SS usermodel classes, eg
> ConditionalFormattingEvaluator and DataValidationEvaluator, these could be
> classes (not interfaces) with your proposed new methods on. They could
> hold the logic (once) for all formats (as it's basically the same on all)
> for priority, checking etc
>
> Doing that would also mean that "our" new classes could call out to our
> existing low-level ones to evaluate formulas. That would mean we wouldn't
> have to make a breaking change to the FormulaEvaluator interface too
>
> Might that work for you?
>
> > No implementations have been done yet, and the Vaadin comments indicate
> > HSSF doesn't parse conditional formatting properly or something, and
> can't
> > be evaluated correctly currently.  I don't know exactly what they found
> > wrong, and it's rather annoying they didn't file any bugs.
>
> I think that comment is out of date, from before the CF work in 3.13
>
> Nick
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Reply via email to