reckart opened a new issue, #444:
URL: https://github.com/apache/uima-uimaj/issues/444

   **Is your feature request related to a problem? Please describe.**
   When inspecting or diffing CAS contents for tests we frequently rely on a 
simple CSV stringification that:
   - does not preserve rich, human-friendly output (HTML) for easier visual 
inspection,
   - lacks configurable columns (anchor, covered text, indexed status),
   - produces unstable ordering for multi-valued/annotation features and 
ambiguous anchors,
   - offers no convenient way to exclude noisy features/types or treat empty 
strings specially,
   - and forces long covered text into the output making diffs noisy.
   
   I'm often frustrated when test failures produce long, hard‑to‑scan CAS dumps 
or when small, irrelevant differences (e.g., non-deterministic anchor numbering 
or list order) make comparisons brittle.
   
   **Describe the solution you'd like**
   Add an enhanced CAS -> comparable text utility with the following 
capabilities:
   - Output formats: Keep CSV but add an HTML renderer for nicer human-readable 
tables.
   - Configurable columns: enable/disable an anchor column, an indexed column, 
and a covered‑text column (with configurable max length and 
middle-abbreviation).
   - Anchor formatting: anchors include type short name, optional annotation 
offsets, optional sofa id, optional indexing marker, and stable disambiguation 
suffixes for duplicate anchors; support optional anchor feature hash suffix.
   - Stable ordering: when multi‑valued features hold annotations, optionally 
sort them by begin (asc), end (desc), type name to provide deterministic 
set‑like ordering.
   - Index awareness: mark FSs as indexed and optionally add a dedicated 
`<INDEXED>` column; use indexed status as a tie-breaker when ordering.
   - Exclusions: allow regex patterns to exclude specific features or types 
from rendering (cache regex compilation for performance).
   - Null/empty handling: configurable `nullValue`, and an option to treat 
empty strings as null so empty values don’t clutter diffs.
   - Multi‑valued rendering: robust handling of array/list FSs and primitive 
arrays, rendering them as bracketed lists; handle nested multi-valued 
structures recursively.
   - Rendering options: omit XML declaration in HTML output and minimal inline 
styling so HTML is self-contained.
   - Public API knobs: setters/getters for all above flags so callers can tune 
output for different use cases (compact machine diffs vs human inspection).
   
   This produces a single stable, configurable comparable representation useful 
for both automated assertions and human debugging.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to