Emmanuel Courcelle wrote:
>> The only significant change you have made is Rgn R? to Rgn R. Does it 
>> still work if you change this to "Rgn R\E.\Q"?
>>
>>   
> Great: I took the autogenerated regex, replaced Rgn ? by Rgn R\E.\Q 
> without changing anything else and it worked.
> 
> However, I changed also the columns parameters, activating the 
> 'Expression (versus simple) box, and configuring
> 
> Rgn R2                  \Rgn R.\
> 
> the import stil does not work.

There will still be a problem with the column mappings. Regular 
expressions can't be used here. It must be an exact string match, so I 
don't think it is possible to use the column name in this case. You can 
however use the column number. The columns are numbered from 0 so the 
file we have been testing on the "Rgn R²" column should be column number 
  32 and the mapping to use \32\.

Anyway I think I now know what the source of the problem is. When we 
open the file for parsing we don't specify a character encoding. Then, 
the encoding used depends on the system the server is running on. On my 
machine the default encoding is UTF-8. But, the file is not encoded in 
UTF-8 and the superscripted 2 in Rgn R² is invalid in UTF-8. That is 
what causes it to be replaced by a ? in the first place. Then, when the 
file contents is sent to the browser a different encoding is used. This 
is configurable in the <basedir>/www/WEB-INF/web.xml file and defaults 
to ISO-8859-1 (ie. the regular encoding used by the western european 
languages). This further corrupts the invalid file contents and it is 
not surprising that the generated regular expression no longer match 
after this roundtrip.

What is a bit surprising is that it worked in my development machine, 
but that probably has to do with that I had configured my web.xml to use 
UTF-8. So even if the file wan't parsed correctly in the first place the 
roundtrip to the browser didn't destroy the regular expressions. As soon 
as I changed my encoding to ISO-8859-1 I got the same problem as 
originally reported.

Now, if you have managed to follow me this far, the question is what to 
do about this problem. Clearly we can't parse the files using the system 
default encoding since it is probably not the correct encoding in most 
cases. We probably need this to be a configurable option for every file 
that we parse. This is however a rather big change and as quick-fix I 
think putting a default encoding option in the base.config file could be 
a good idea. If no option is specified the default choice will be 
ISO-8859-1.

The quick-fix can be released in 2.1.1 but the more general solution has 
to wait for 2.2 or maybe even for 2.3.

/Nicklas

> 
> E.C.
> 
> 
> 
> -------------------------------------------------------------------------
> Using Tomcat but need to do more? Need to support web services, security?
> Get stuff done quickly with pre-integrated technology to make your job easier
> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
> _______________________________________________
> The BASE general discussion mailing list
> basedb-users@lists.sourceforge.net
> unsubscribe: send a mail with subject "unsubscribe" to
> [EMAIL PROTECTED]



-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
The BASE general discussion mailing list
basedb-users@lists.sourceforge.net
unsubscribe: send a mail with subject "unsubscribe" to
[EMAIL PROTECTED]

Reply via email to