Re: Evaluating expressions outside a cell value context

Javen O'Neal Thu, 23 Feb 2017 21:55:40 -0800

Just reading through some of the open POI bugs.
Looks like this work is related to
https://bz.apache.org/bugzilla/show_bug.cgi?id=58131
This bug should be updated, possibly resolved, with the work you've done here.


On Mon, Feb 13, 2017 at 7:51 AM, Greg Woolsey <[email protected]> wrote:
> What I meant was external implementations of the Sheet interface.
> Everything inside POI that I've done in my branch is fully implemented with
> tests and JavaDocs.
>
> On Sun, Feb 12, 2017 at 7:32 PM Javen O'Neal <[email protected]> wrote:
>
> Anything that is still stubbed can be annotated with @NotImplemented, which
> will add the corresponding note to the JavaDocs.
>
> On Feb 12, 2017 17:08, "Greg Woolsey" <[email protected]> wrote:
>
>> Thanks for taking the time to inspect it, and point me to the utilities, I
>> hadn't found them yet.  I'll incorporate them and run all the tests
>> including my new ones.
>>
>> Are folks OK with this as part of 3.16?  I'd like that.  There are some
> new
>> methods on some interfaces, so if there are custom implementations of
>> things like Sheet, they would need changes.  Returning null or false
> should
>> be fine though, as the only callers are the new evaluator classes, and
>> anyone using those would need the new API methods anyway.
>>
>> On Sun, Feb 12, 2017, 5:22 PM Javen O'Neal <[email protected]> wrote:
>>
>> > I haven't read through all your changes on GitHub yet, but all the
>> > changes so far look good. I have a few other suggestions to
>> > deduplicate some code using SheetUtil and FormulaShifter, but those
>> > changes can be made at a later date if needed.
>> >
>> > You should be able to use git-svn to push your changes. Read through
>> > and improve our git documentation [1] if necessary.
>> >
>> > [1] https://poi.apache.org/guidelines.html#Approach+3+-+the+git+way
>> >
>> > On Fri, Feb 10, 2017 at 12:10 AM, Greg Woolsey <[email protected]>
>> > wrote:
>> > > Well, I couldn't stand the incomplete support, so now this supports
>> > > evaluating rules for all the different types, including range
>> aggregates
>> > > like "greater than 2 standard deviations" and "top 10".  Still doesn't
>> > > provide help assigning partitioning buckets for icon sets and colors,
>> but
>> > > everything else is working.
>> > >
>> > > I filed a big bug with Vaadin, listing 5 core design problems I've
>> found
>> > > with their Conditional Formatting implementation, and offering my
>> > > replacement for their code that uses the new POI evaluator instead.
>> They
>> > > bit, and are interested, but I won't make my first commit some
> behemoth
>> > > that hasn't received any feedback.  I know there are conventions and
>> > ideas
>> > > I've missed :)
>> > >
>> > >  I need both sets of changes for my day job, so I'm all-in on doing it
>> > > right in both directions and facilitating the conversations.
>> > >
>> > > On Tue, Feb 7, 2017 at 11:50 PM Greg Woolsey <[email protected]>
>> > wrote:
>> > >
>> > >> Now this fork also contains ConditionalSpreadsheetEvaluator, and
>> related
>> > >> code.  The unit test is essentially a stub, but tests one basic style
>> > for
>> > >> proof of concept.
>> > >>
>> > >> I've actually implemented a version of Vaadin Spreadsheet that uses
>> this
>> > >> new code to see how it performs, and I'm quite happy with both the
>> > improved
>> > >> performance (~50% faster than theirs) and feature coverage/accuracy.
>> > I've
>> > >> found 5 major bugs so far in what they did, most likely the result of
>> > the
>> > >> complexity of the document structure and the fact that several key
>> > pieces
>> > >> of information where still buried in the implementation classes, and
>> > hadn't
>> > >> been surfaced yet to the SS interfaces.  I've done that in this
> branch
>> > also.
>> > >>
>> > >> My code here is my own, I didn't like anything I saw elsewhere enough
>> to
>> > >> copy it :)
>> > >>
>> > >> Evaluation currently doesn't support range-based conditions, such as
>> > >> TOP_10, DUPLICATE, etc.  Those don't seem like they'd be that bad to
>> > do, if
>> > >> someone wants to take a stab at them.  I don't need them (yet), so
>> they
>> > >> just evaluate to "false" with a TODO comment for now.
>> > >>
>> > >> Likewise, there is no code to report which partition bucket a cell
>> falls
>> > >> into when the condition type is one of the partitioned styles, 2,3 or
>> 4
>> > >> value buckets, gradient fill, etc.  The fact that the rule matches
>> > (based
>> > >> on range) is available, the caller would need to evaluate the rule
>> type
>> > and
>> > >> see what lies beneath.
>> > >>
>> > >> I assume interested parties will take a look as they have time and
>> > >> inclination.  I'm sure there are areas to discuss, beyond where to
> put
>> > the
>> > >> curly braces :)  I left some comments as to alternate strategies for
>> > some
>> > >> areas, where I opted for less change to existing classes as a
> starting
>> > >> point, even if it means a switch...case here or there when a new
>> method
>> > >> could be added to an Enum class instead.
>> > >>
>> > >> Hopefully the new methods on the SS interfaces are deemed minor - the
>> > >> values were already there in most cases, at least on one side or the
>> > other
>> > >> (HSSF/XSSF), with a static default to use for the other one per MS
>> > >> documentation.
>> > >>
>> > >> Greg
>> > >>
>> > >> On Wed, Feb 1, 2017 at 5:38 PM Greg Woolsey <[email protected]>
>> > >> wrote:
>> > >>
>> > >> Oh, the primary class is o.a.p.ss.formula.DataValidationEvaluator
>> > >>
>> > >> On Wed, Feb 1, 2017 at 5:37 PM Greg Woolsey <[email protected]>
>> > >> wrote:
>> > >>
>> > >> My GitHub branch now contains Data Validation code and unit tests.
>> The
>> > >> test file DataValidationEvaluations.xlsx contains a large set of
>> > validation
>> > >> examples, including one formula example that applies to a range of
>> cells
>> > >> and uses a relative formula.  The evaluation code has corresponding
>> > logic
>> > >> to offset the relative formula Ptgs from the top left of the region.
>> > >>
>> > >> Every test is labeled in the file with column A as a description,
>> > column B
>> > >> as the cell with validation, and column C the expected result, TRUE =
>> > >> valid, FALSE = invalid.
>> > >>
>> > >> The unit test compares the POI validation result with the expected
>> > column,
>> > >> failing on boolean mismatches.
>> > >>
>> > >> Have not had time to run all tests yet, but this should only be code
>> > >> additions, not modifications.  I'll run them soon.
>> > >>
>> > >> I'm sure there are code style discussions to be had - for example I
>> > >> implemented some things as inner classes for now, but we may want
> them
>> > >> top-level instead.
>> > >>
>> > >> Comments welcome, this is early code but is built on top of the SS
>> > >> interfaces, so should be stable for HSSF and XSSF.
>> > >>
>> > >> Greg
>> > >>
>> > >> On Mon, Jan 30, 2017 at 9:55 AM Greg Woolsey <[email protected]>
>> > >> wrote:
>> > >>
>> > >> Also, I just found this sample workbook
>> > >> <
>> > http://download.microsoft.com/download/1/6/F/16F701E9-63BA-
>> 48D3-8B48-096F9288F443/AF010235700_en-us_cfsamples_af010235700.xlsx>
>> > in
>> > >> the Excel online support docs.  If I have time to turn that into a
>> unit
>> > >> test, it's about as complete as we could want.  Some parts are lost
>> > saving
>> > >> as HSSF, but we can then test that we evaluate what remains the same
>> > way as
>> > >> newer Excel when opening a legacy formatted file.
>> > >>
>> > >> On Mon, Jan 30, 2017 at 9:38 AM Greg Woolsey <[email protected]>
>> > >> wrote:
>> > >>
>> > >> Thanks, that makes sense wrt custom implementations of
>> FormulaEvaluator
>> > -
>> > >> I hadn't thought about anyone rolling their own, but it's an
>> interface,
>> > so
>> > >> quite possible.  Too bad we can't require Java 8 yet and use default
>> > >> methods.
>> > >>
>> > >> I can work with the new *Evaluator class idea.  And the HSSF
>> limitations
>> > >> will just mean more unit tests :)  I have Excel 2016 available so I
>> can
>> > >> create test workbooks, save them as both XLSX and XLS, and compare
>> > >> evaluations.  I can then write unit tests based on them that expect
>> the
>> > >> results seen in Excel.  That should give us reference points for
>> > confidence
>> > >> in our replication of their logic, especially around rule
>> priority/order
>> > >> and XLS HSSF files.
>> > >>
>> > >> On Fri, Jan 27, 2017 at 11:05 PM Nick Burch <[email protected]>
>> > wrote:
>> > >>
>> > >> On Sat, 28 Jan 2017, Greg Woolsey wrote:
>> > >> > As noted in one of the method JavaDocs, we also need to expose and
>> > make
>> > >> use
>> > >> > of the ConditionalFormattingRule "priority" attribute.  That's key
>> to
>> > >> > matching the right rule when more than one rule applies to a cell.
>> > Only
>> > >> > the first match in priority order is applied.
>> > >>
>> > >> Your slight challenge is that not all Conditional Formatting rules
>> have
>> > a
>> > >> priority... XLSX ones do, and newer XLS ones based on CFRule12Record
>> > (sid
>> > >> = 0x087A) do, but the older XLS ones (CFRuleRecord / 0x01B1) don't.
>> I'm
>> > >> not sure what Excel does for those, but my hunch (based on our API)
> is
>> > >> that it uses their order as a priority.
>> > >>
>> > >>
>> > >> > I've created a fork in GitHub for this, and committed a stab at
>> > >> > high-level API methods that could be added to the FormulaEvaluator
>> > >> > interface:
>> > >> >
>> > >>
>> > https://github.com/WoozyG/poi/commit/d44fee7bd03ed450af589467ec90e2
>> 581b9f2b16$
>> > >>
>> > >> FormulaEvaluator is an interface, which we have 4 implementations of
>> in
>> > >> our codebabse, and I'd guess that other complex users of POI will
> have
>> > >> dozens more. I'm not sure, therefore, that we want to be putting all
>> of
>> > >> the CF and DV logic methods on there, especially as it'll be common
> to
>> > all
>> > >> implementations
>> > >>
>> > >> The HSSF classes for CF all use org.apache.poi.ss.formula.Formula
>> which
>> > is
>> > >> PTG based. The HSSF classes for DV seem to store the raw PTGs.
>> > >>
>> > >> If we added two new SS usermodel classes, eg
>> > >> ConditionalFormattingEvaluator and DataValidationEvaluator, these
>> could
>> > be
>> > >> classes (not interfaces) with your proposed new methods on. They
> could
>> > >> hold the logic (once) for all formats (as it's basically the same on
>> > all)
>> > >> for priority, checking etc
>> > >>
>> > >> Doing that would also mean that "our" new classes could call out to
>> our
>> > >> existing low-level ones to evaluate formulas. That would mean we
>> > wouldn't
>> > >> have to make a breaking change to the FormulaEvaluator interface too
>> > >>
>> > >> Might that work for you?
>> > >>
>> > >> > No implementations have been done yet, and the Vaadin comments
>> > indicate
>> > >> > HSSF doesn't parse conditional formatting properly or something,
> and
>> > >> can't
>> > >> > be evaluated correctly currently.  I don't know exactly what they
>> > found
>> > >> > wrong, and it's rather annoying they didn't file any bugs.
>> > >>
>> > >> I think that comment is out of date, from before the CF work in 3.13
>> > >>
>> > >> Nick
>> > >>
>> > >> ---------------------------------------------------------------------
>> > >> To unsubscribe, e-mail: [email protected]
>> > >> For additional commands, e-mail: [email protected]
>> > >>
>> > >>
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: [email protected]
>> > For additional commands, e-mail: [email protected]
>> >
>> >
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Evaluating expressions outside a cell value context

Reply via email to