Thanks Karl, I'll submit a patch
-Fuad

-----Original Message-----
From: Karl Wright [mailto:[email protected]] 
Sent: March-25-11 12:20 AM
To: [email protected]
Subject: Re: XMLWriterContext: tagContext doesn't escape chars

CONNECTORS-171.

Fuad, I think it's perfectly reasonable to modify XMLWriterContext to simply
add proper entity-escaping of characters.  Would you like to submit a patch?

Karl


On Fri, Mar 25, 2011 at 12:12 AM, Karl Wright <[email protected]> wrote:
> Hmm, no, I'd call this a bug in the current connector.  You can give 
> it a feed that's perfectly valid, and if you are running in 
> "dechromed" mode the description that gets indexed might well be 
> corrupted.
>
> I'll create a ticket.
>
> Karl
>
>
> On Thu, Mar 24, 2011 at 10:41 PM, Fuad Efendi <[email protected]> wrote:
>> Not a bug with current RSS connector, but something probably important...
>>
>> Current RSS connector  uses XMLFileContext for temporary XML(?), and 
>> here problems may happen if <description> and <content> contain
sub-elements...
>> but in our specific use case it is HTML snippet, and we don't 
>> consider it XML, so that unescaped characters are natural...
>>
>> So I think there are no any problems (with current RSS specs), but we 
>> might have problems in the future with another use cases such as:
>> <description>
>>        <sub-description-1>     &lt;H1&gt;Header </sub-description-1> 
>> </description>
>>
>>
>> Output to temp. file will be malformed XML:
>>        <sub-description-1>     <H1> Header </sub-description-1>
>>
>>
>>
>>
>>
>>
>> -----Original Message-----
>> From: Karl Wright [mailto:[email protected]]
>> Sent: March-24-11 10:26 PM
>> To: [email protected]
>> Subject: Re: XMLWriterContext: tagContext doesn't escape chars
>>
>> Ok, although I am curious whether this is a bug with a current connector?
>> Or is this something new you were trying to do?
>>
>> Karl
>>
>> On Thu, Mar 24, 2011 at 10:21 PM, Fuad Efendi <[email protected]> wrote:
>>> Hi Karl, I think initial message was improperly (re)formatted... I 
>>> suspect connector-user allows HTML, and connector-dev allows only 
>>> plain
>> text.
>>>
>>> The class XMLWriterContext, method tagContents(char[] ch, int start, 
>>> int
>>> length) should escape special characters before writing to Writer...
>>> beginTag and endTag already do that; obviously this class is needed 
>>> to output XML.
>>> Fortunately it is easy to extend this class in "connector" plugin 
>>> and override this method.
>>>
>>>
>>>  /** This method is meant to be extended by classes that extend this 
>>> class */
>>>  protected void tagContents(char[] ch, int start, int length)
>>>    throws ManifoldCFException
>>>  {
>>>    try
>>>    {
>>>      theWriter.write(ch,start,length);
>>>    }
>>>    catch (java.net.SocketTimeoutException e) ... ... ...
>>>
>>>
>>> -Fuad
>>>
>>>
>>>
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: Karl Wright [mailto:[email protected]]
>>> Sent: March-24-11 10:10 PM
>>> To: [email protected]
>>> Subject: Re: XMLWriterContext: tagContext doesn't escape chars
>>>
>>> Could you resend your previous message?  I don't think it made it 
>>> through; perhaps you were not signed up for the list at that point.
>>> This is the first message of this thread that was posted.
>>>
>>> Thanks,
>>> Karl
>>>
>>> On Thu, Mar 24, 2011 at 7:22 PM, Fuad Efendi <[email protected]> wrote:
>>>> I just found it.
>>>>
>>>>
>>>>
>>>>  /** This method is meant to be extended by classes that extend 
>>>> this class */
>>>>
>>>>  protected void tagContents(char[] ch, int start, int length)
>>>>
>>>>    throws ManifoldCFException
>>>>
>>>>  {
>>>>
>>>>    try
>>>>
>>>>    {
>>>>
>>>>      theWriter.write(ch,start,length);
>>>>
>>>>    }
>>>>
>>>>    catch (java.net.SocketTimeoutException e)
>>>>
>>>> ...
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> And we are using temp files with RSS connector.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> I tried to split big feed on "entities", stored as an XML 
>>>> Documents, but I found some XML-escaped characters will be 
>>>> unescaped (for instance, RSS may contain HTML snippet as a value of 
>>>> an element)
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> -Fuad
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>

Reply via email to