[ 
https://issues.apache.org/jira/browse/CAMEL-6004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13681032#comment-13681032
 ] 

Aki Yoshida commented on CAMEL-6004:
------------------------------------

Hi,
I have a question to the camel team.

I got distracted to this cute regex-based xml tokenizer and I made it work for 
both cases.

However, the original approach itself has several inherent limitation in 
parsing a tree structure, namely handling of the same named elements appearing 
at different depths or extracting namespace declarations at several ancestor 
depths (the current approach only allows the extraction of additional namespace 
declarations from one particular depth specified by the inheritNamespaceToken 
parameter). So, it works perfect for those cases where the input document is 
constrained so that the splitting elements do not appear at different depths in 
hierarchy and any additional namespace declarations needed in the split 
elements come from one ancestor depth. But it does not work for more general 
cases.

As long as the input document fits to the constraint, this regex-based approach 
is more efficient as it does not need to construct those xml artifacts that 
need to be serialized into a token. However, for other cases, it would be more 
practical to use a stax based tokenizer to build the valid namespace context 
and serialize the content into a token instead of purely relying on the regex 
based parsing.

I don't know how typical use cases look and whether we can stay with or at 
least keep the regex-based approach or move to a stax based approach.

regards, aki

                
> Tokenize XML does not support self-closing XML tokens
> -----------------------------------------------------
>
>                 Key: CAMEL-6004
>                 URL: https://issues.apache.org/jira/browse/CAMEL-6004
>             Project: Camel
>          Issue Type: Bug
>          Components: camel-core
>    Affects Versions: 2.10.2
>            Reporter: Greg Heidorn
>            Assignee: Willem Jiang
>            Priority: Minor
>             Fix For: Future
>
>         Attachments: GenericTokenizeTest.java
>
>
> Tokenize creates non-well-formed XML when handling self-closing XML tokens.  
> Tokenize should support parsing tokens that are either have a closing tag or 
> are self-closing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to