Hi,
my reguler expression is the following :

headlineRegex  ->  (<h1>)?(.*)</h1>
group  ->  2

I am using the regular expression above to extract a headline (h1) from
an HTML document

    while (mHeadlineRE.match(content, offset)) {

For some Html pages this regular expression works,
but for some Html pages, it gives the following errors :

[...]
        at org.apache.regexp.RE.matchNodes(Unknown Source)
        at org.apache.regexp.RE.matchNodes(Unknown Source)
        at org.apache.regexp.RE.matchNodes(Unknown Source)
        at org.apache.regexp.RE.matchNodes(Unknown Source)
        at org.apache.regexp.RE.matchNodes(Unknown Source)
        at org.apache.regexp.RE.matchNodes(Unknown Source)
        at org.apache.regexp.RE.matchNodes(Unknown Source)
        at org.apache.regexp.RE.matchNodes(Unknown Source)
        at org.apache.regexp.RE.matchNodes(Unknown Source)
        at org.apache.regexp.RE.matchNodes(Unknown Source)
        at org.apache.regexp.RE.matchAt(Unknown Source)
        at org.apache.regexp.RE.match(Unknown Source)
        at org.apache.regexp.RE.match(Unknown Source)
        at
net.sf.regain.crawler.preparator.html.HtmlContentExtractor.extractHeadli
nes(HtmlContentExtractor.java:140)

on this line :
while (mHeadlineRE.match(content, offset)) {


when I use this regex : <h1>(.*)</h1>     
group 1
I never have this error.

The problem is that I don't even know how to debug this error, that's
why I am asking for some help here.

Any help is very appreciated, thank u!


__________________________________

   Matthew




============================================
Internet communications are not secure and therefore Fortis Banque Luxembourg 
S.A. does not accept legal responsibility for the contents of this message. The 
information contained in this e-mail is confidential and may be legally 
privileged. It is intended solely for the addressee. If you are not the 
intended recipient, any disclosure, copying, distribution or any action taken 
or omitted to be taken in reliance on it, is prohibited and may be unlawful. 
Nothing in the message is capable or intended to create any legally binding 
obligations on either party and it is not intended to provide legal advice.
============================================

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to