Hmm, no, I'd call this a bug in the current connector.  You can give
it a feed that's perfectly valid, and if you are running in
"dechromed" mode the description that gets indexed might well be
corrupted.

I'll create a ticket.

Karl


On Thu, Mar 24, 2011 at 10:41 PM, Fuad Efendi <[email protected]> wrote:
> Not a bug with current RSS connector, but something probably important...
>
> Current RSS connector  uses XMLFileContext for temporary XML(?), and here
> problems may happen if <description> and <content> contain sub-elements...
> but in our specific use case it is HTML snippet, and we don't consider it
> XML, so that unescaped characters are natural...
>
> So I think there are no any problems (with current RSS specs), but we might
> have problems in the future with another use cases such as:
> <description>
>        <sub-description-1>     &lt;H1&gt;Header </sub-description-1>
> </description>
>
>
> Output to temp. file will be malformed XML:
>        <sub-description-1>     <H1> Header </sub-description-1>
>
>
>
>
>
>
> -----Original Message-----
> From: Karl Wright [mailto:[email protected]]
> Sent: March-24-11 10:26 PM
> To: [email protected]
> Subject: Re: XMLWriterContext: tagContext doesn't escape chars
>
> Ok, although I am curious whether this is a bug with a current connector?
> Or is this something new you were trying to do?
>
> Karl
>
> On Thu, Mar 24, 2011 at 10:21 PM, Fuad Efendi <[email protected]> wrote:
>> Hi Karl, I think initial message was improperly (re)formatted... I
>> suspect connector-user allows HTML, and connector-dev allows only plain
> text.
>>
>> The class XMLWriterContext, method tagContents(char[] ch, int start,
>> int
>> length) should escape special characters before writing to Writer...
>> beginTag and endTag already do that; obviously this class is needed to
>> output XML.
>> Fortunately it is easy to extend this class in "connector" plugin and
>> override this method.
>>
>>
>>  /** This method is meant to be extended by classes that extend this
>> class */
>>  protected void tagContents(char[] ch, int start, int length)
>>    throws ManifoldCFException
>>  {
>>    try
>>    {
>>      theWriter.write(ch,start,length);
>>    }
>>    catch (java.net.SocketTimeoutException e) ... ... ...
>>
>>
>> -Fuad
>>
>>
>>
>>
>>
>>
>> -----Original Message-----
>> From: Karl Wright [mailto:[email protected]]
>> Sent: March-24-11 10:10 PM
>> To: [email protected]
>> Subject: Re: XMLWriterContext: tagContext doesn't escape chars
>>
>> Could you resend your previous message?  I don't think it made it
>> through; perhaps you were not signed up for the list at that point.
>> This is the first message of this thread that was posted.
>>
>> Thanks,
>> Karl
>>
>> On Thu, Mar 24, 2011 at 7:22 PM, Fuad Efendi <[email protected]> wrote:
>>> I just found it.
>>>
>>>
>>>
>>>  /** This method is meant to be extended by classes that extend this
>>> class */
>>>
>>>  protected void tagContents(char[] ch, int start, int length)
>>>
>>>    throws ManifoldCFException
>>>
>>>  {
>>>
>>>    try
>>>
>>>    {
>>>
>>>      theWriter.write(ch,start,length);
>>>
>>>    }
>>>
>>>    catch (java.net.SocketTimeoutException e)
>>>
>>> ...
>>>
>>>
>>>
>>>
>>>
>>> And we are using temp files with RSS connector.
>>>
>>>
>>>
>>>
>>>
>>> I tried to split big feed on "entities", stored as an XML Documents,
>>> but I found some XML-escaped characters will be unescaped (for
>>> instance, RSS may contain HTML snippet as a value of an element)
>>>
>>>
>>>
>>>
>>>
>>> -Fuad
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>
>

Reply via email to