reckart opened a new pull request, #445:
URL: https://github.com/apache/uima-uimaj/pull/445

   **What's in the PR**
   - Enhance `CasToComparableText`: add HTML renderer (in addition to CSV).
   - Add configurable columns: `<ANCHOR>`, `<INDEXED>`, `<COVERED_TEXT>` and 
`setMaxLengthCoveredText`.
   - Stable, disambiguated anchors: optional unique anchors, sofa id marker, 
indexed marker, optional anchor feature hash.
   - Deterministic ordering: sort annotation-valued multi-valued features; 
feature-hash tie-breaker; indexed-first tie-break.
   - Exclude features/types via regex patterns with compiled-pattern cache.
   - Treat empty strings as null and configurable `nullValue`.
   - Robust multi-valued support: arrays and list types rendered recursively 
with primitive-array handling.
   - New configuration API (setters/getters) for rendering options (e.g., 
`setOmitXmlDeclaration`, `setAnchorFeatureHash`, `setUniqueAnchors`, etc.).
   - Update `CasToComparableTextTest` to cover HTML output, exclusions, 
ordering, anchor hashing and array/list rendering.
   
   **How to test manually**
   * No specific test procedure
   
   **Automatic testing**
   * [x] PR adds/updates unit tests
   
   **Documentation**
   * [ ] PR adds/updates documentation
   
   **Organizational**
   - [ ] PR adds/updates dependencies.
         <sub><sup>Only dependencies under [approved 
licenses](http://www.apache.org/legal/resolved.html#category-a) are allowed. 
LICENSE and NOTICE files in the respective modules where dependencies have been 
added as well as in the project root have been updated.</sup></sub>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to