Not a bug with current RSS connector, but something probably important...
Current RSS connector uses XMLFileContext for temporary XML(?), and here
problems may happen if <description> and <content> contain sub-elements...
but in our specific use case it is HTML snippet, and we don't consider it
XML, so that unescaped characters are natural...
So I think there are no any problems (with current RSS specs), but we might
have problems in the future with another use cases such as:
<description>
<sub-description-1> <H1>Header </sub-description-1>
</description>
Output to temp. file will be malformed XML:
<sub-description-1> <H1> Header </sub-description-1>
-----Original Message-----
From: Karl Wright [mailto:[email protected]]
Sent: March-24-11 10:26 PM
To: [email protected]
Subject: Re: XMLWriterContext: tagContext doesn't escape chars
Ok, although I am curious whether this is a bug with a current connector?
Or is this something new you were trying to do?
Karl
On Thu, Mar 24, 2011 at 10:21 PM, Fuad Efendi <[email protected]> wrote:
> Hi Karl, I think initial message was improperly (re)formatted... I
> suspect connector-user allows HTML, and connector-dev allows only plain
text.
>
> The class XMLWriterContext, method tagContents(char[] ch, int start,
> int
> length) should escape special characters before writing to Writer...
> beginTag and endTag already do that; obviously this class is needed to
> output XML.
> Fortunately it is easy to extend this class in "connector" plugin and
> override this method.
>
>
> /** This method is meant to be extended by classes that extend this
> class */
> protected void tagContents(char[] ch, int start, int length)
> throws ManifoldCFException
> {
> try
> {
> theWriter.write(ch,start,length);
> }
> catch (java.net.SocketTimeoutException e) ... ... ...
>
>
> -Fuad
>
>
>
>
>
>
> -----Original Message-----
> From: Karl Wright [mailto:[email protected]]
> Sent: March-24-11 10:10 PM
> To: [email protected]
> Subject: Re: XMLWriterContext: tagContext doesn't escape chars
>
> Could you resend your previous message? I don't think it made it
> through; perhaps you were not signed up for the list at that point.
> This is the first message of this thread that was posted.
>
> Thanks,
> Karl
>
> On Thu, Mar 24, 2011 at 7:22 PM, Fuad Efendi <[email protected]> wrote:
>> I just found it.
>>
>>
>>
>> /** This method is meant to be extended by classes that extend this
>> class */
>>
>> protected void tagContents(char[] ch, int start, int length)
>>
>> throws ManifoldCFException
>>
>> {
>>
>> try
>>
>> {
>>
>> theWriter.write(ch,start,length);
>>
>> }
>>
>> catch (java.net.SocketTimeoutException e)
>>
>> ...
>>
>>
>>
>>
>>
>> And we are using temp files with RSS connector.
>>
>>
>>
>>
>>
>> I tried to split big feed on "entities", stored as an XML Documents,
>> but I found some XML-escaped characters will be unescaped (for
>> instance, RSS may contain HTML snippet as a value of an element)
>>
>>
>>
>>
>>
>> -Fuad
>>
>>
>>
>>
>>
>>
>>
>>
>
>