Unification of cell type convensions makes sense. Changing the logic to
manipulate cell type/value convenrsions via public methods is fine as long
as it does not break compatibility.

setCellType() was introduced in HSSFCell in POI 1.0 which was very
format-centric. It wasn't designed to mimic Excel's object model, it was
rather a usermodel that reflected our knowledge about the XLS Biff8 format
at that time.

Then XSSF came and implemenetd its own setCellType() semantics and I'm not
surprised that there is a divergence.

Regards,
Yegor

On Sat, Jan 5, 2019 at 12:17 AM Vladislav Galas <gal...@apache.org> wrote:

> Correct me if I'm wrong: all three said implementations should behave
> identically in the common field (say, SXSSF doen't support array functions
> for understandable reasons) as long as SpreadsheetVersion constraints
> aren't violated.
>
> However, I was trying to improved test coverage for *Cell class and found
> out that at times the implementations behave very differently.
> Understandably, this comes from three reasons:
> a) written by different people at different times
> b) lack of specifications on the interface
> c) bloated interfaces.
>
> So I could take the job of unifying the behavior. I forsee the objections
> like "we shouldn't break what users are accustomed to". Fair enough, but
> mainly for the "user-observable" part. I.e. values are set/got ok, formulae
> are evaluated. But then there are those differences in the methods users (I
> guess) don't actually use.
>
> Say, we have a formula with a string cached value and a formula:
> Cell cell = ...;
> cell.setCellFormula("\"foo\");
> cell.setCellValue(1);
> ok, now
> getCellType() == CellType.FORMULA,
> getCachedFormulaResultType() == CellType.NUMERIC, and
> getNumericCellValue() == 1.0.
>
> So far so good. But if we say
> cell.setCellType(CellType.NUMERIC), what will happen? Spoiler: anything
> can happen.
> For HSSF and SXSSF it will be FORMULA (why??)
> For XSSF it will be BLANK (why??)
>
> Such discrepancies may happen whenever implicit value/type casting is
> performed, i.e. for setCellValue, get*CellValue, setCellType and
> setCellFormula.
>
> So,
> 1. I can prepare a set of contracts for such functions and present them
> for discussion here. The passed through ones will be clearly stated in the
> Cell interface, tests updated (1 clause in the javadoc = 1 test),
> implementations fixed.
>
> 2. A group of stronger and more invasive specific proposals:
>   a) mark setCellType @Deprecated. It's not something you can do in Excel,
> nor the implicit value conversion logic is reliable. setCellType(FORMULA)
> is meaningless. On the contrary, setBlank() should be a part of the public
> interface because it has a clear meaning: delete both value and formula,
> and preserve the cell with its style.
>   b) make get*Value return the value only if the value type strictly
> corresponds to the method name. The evaluator performs conversions on its
> own, and I very much doubt that users rely on a conversion like "get the
> numeric value of the error code".
>   c) at last, remove FORMULA from CellType and deprecate
> getCachedFormulaResultType. Why should anyone be concerned with the formula
> while reading a value? get*Value work identically, be formula set or not. I
> suspect there may be a reason for current state, though, because HSSF and
> XSSF implementations are strongly tied to the underlying xml beans. But
> there's still a good chance this removal is possible.
>   d) probably, CellType._NONE can also be removed
>   e) it also may be a good idea to remove BLANK as well (simultaneously
> adding isBlank() method), because it's "not a value".
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org
> For additional commands, e-mail: dev-h...@poi.apache.org
>
>

Reply via email to