Unification of cell type convensions makes sense. Changing the logic to manipulate cell type/value convenrsions via public methods is fine as long as it does not break compatibility.
setCellType() was introduced in HSSFCell in POI 1.0 which was very format-centric. It wasn't designed to mimic Excel's object model, it was rather a usermodel that reflected our knowledge about the XLS Biff8 format at that time. Then XSSF came and implemenetd its own setCellType() semantics and I'm not surprised that there is a divergence. Regards, Yegor On Sat, Jan 5, 2019 at 12:17 AM Vladislav Galas <gal...@apache.org> wrote: > Correct me if I'm wrong: all three said implementations should behave > identically in the common field (say, SXSSF doen't support array functions > for understandable reasons) as long as SpreadsheetVersion constraints > aren't violated. > > However, I was trying to improved test coverage for *Cell class and found > out that at times the implementations behave very differently. > Understandably, this comes from three reasons: > a) written by different people at different times > b) lack of specifications on the interface > c) bloated interfaces. > > So I could take the job of unifying the behavior. I forsee the objections > like "we shouldn't break what users are accustomed to". Fair enough, but > mainly for the "user-observable" part. I.e. values are set/got ok, formulae > are evaluated. But then there are those differences in the methods users (I > guess) don't actually use. > > Say, we have a formula with a string cached value and a formula: > Cell cell = ...; > cell.setCellFormula("\"foo\"); > cell.setCellValue(1); > ok, now > getCellType() == CellType.FORMULA, > getCachedFormulaResultType() == CellType.NUMERIC, and > getNumericCellValue() == 1.0. > > So far so good. But if we say > cell.setCellType(CellType.NUMERIC), what will happen? Spoiler: anything > can happen. > For HSSF and SXSSF it will be FORMULA (why??) > For XSSF it will be BLANK (why??) > > Such discrepancies may happen whenever implicit value/type casting is > performed, i.e. for setCellValue, get*CellValue, setCellType and > setCellFormula. > > So, > 1. I can prepare a set of contracts for such functions and present them > for discussion here. The passed through ones will be clearly stated in the > Cell interface, tests updated (1 clause in the javadoc = 1 test), > implementations fixed. > > 2. A group of stronger and more invasive specific proposals: > a) mark setCellType @Deprecated. It's not something you can do in Excel, > nor the implicit value conversion logic is reliable. setCellType(FORMULA) > is meaningless. On the contrary, setBlank() should be a part of the public > interface because it has a clear meaning: delete both value and formula, > and preserve the cell with its style. > b) make get*Value return the value only if the value type strictly > corresponds to the method name. The evaluator performs conversions on its > own, and I very much doubt that users rely on a conversion like "get the > numeric value of the error code". > c) at last, remove FORMULA from CellType and deprecate > getCachedFormulaResultType. Why should anyone be concerned with the formula > while reading a value? get*Value work identically, be formula set or not. I > suspect there may be a reason for current state, though, because HSSF and > XSSF implementations are strongly tied to the underlying xml beans. But > there's still a good chance this removal is possible. > d) probably, CellType._NONE can also be removed > e) it also may be a good idea to remove BLANK as well (simultaneously > adding isBlank() method), because it's "not a value". > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org > For additional commands, e-mail: dev-h...@poi.apache.org > >