Hi Make,
May be consider using of UIMA ("the rule engine") ?
BR,
Oleg
On Thu, Dec 20, 2012 at 1:05 PM, Michael McCandless (JIRA)
<[email protected]>wrote:
>
> [
> https://issues.apache.org/jira/browse/TIKA-1048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel]
>
> Michael McCandless updated TIKA-1048:
> -------------------------------------
>
> Attachment: TIKA-1048.patch
>
> Patch w/ failing test ... I'm not sure where/how to best fix this yet ...
>
> > XMLParser should add whitespace between elements
> > ------------------------------------------------
> >
> > Key: TIKA-1048
> > URL: https://issues.apache.org/jira/browse/TIKA-1048
> > Project: Tika
> > Issue Type: Bug
> > Components: parser
> > Reporter: Michael McCandless
> > Fix For: 1.3
> >
> > Attachments: TIKA-1048.patch
> >
> >
> > If the incoming XML is compact (ie doesn't have whitespace between
> elements), I think we should somehow add whitespace between elements when
> extracting text?
>
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA
> administrators
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>