Emmanuel Courcelle wrote: >> The only significant change you have made is Rgn R? to Rgn R. Does it >> still work if you change this to "Rgn R\E.\Q"? >> >> > Great: I took the autogenerated regex, replaced Rgn ? by Rgn R\E.\Q > without changing anything else and it worked. > > However, I changed also the columns parameters, activating the > 'Expression (versus simple) box, and configuring > > Rgn R2 \Rgn R.\ > > the import stil does not work.
There will still be a problem with the column mappings. Regular expressions can't be used here. It must be an exact string match, so I don't think it is possible to use the column name in this case. You can however use the column number. The columns are numbered from 0 so the file we have been testing on the "Rgn R²" column should be column number 32 and the mapping to use \32\. Anyway I think I now know what the source of the problem is. When we open the file for parsing we don't specify a character encoding. Then, the encoding used depends on the system the server is running on. On my machine the default encoding is UTF-8. But, the file is not encoded in UTF-8 and the superscripted 2 in Rgn R² is invalid in UTF-8. That is what causes it to be replaced by a ? in the first place. Then, when the file contents is sent to the browser a different encoding is used. This is configurable in the <basedir>/www/WEB-INF/web.xml file and defaults to ISO-8859-1 (ie. the regular encoding used by the western european languages). This further corrupts the invalid file contents and it is not surprising that the generated regular expression no longer match after this roundtrip. What is a bit surprising is that it worked in my development machine, but that probably has to do with that I had configured my web.xml to use UTF-8. So even if the file wan't parsed correctly in the first place the roundtrip to the browser didn't destroy the regular expressions. As soon as I changed my encoding to ISO-8859-1 I got the same problem as originally reported. Now, if you have managed to follow me this far, the question is what to do about this problem. Clearly we can't parse the files using the system default encoding since it is probably not the correct encoding in most cases. We probably need this to be a configurable option for every file that we parse. This is however a rather big change and as quick-fix I think putting a default encoding option in the base.config file could be a good idea. If no option is specified the default choice will be ISO-8859-1. The quick-fix can be released in 2.1.1 but the more general solution has to wait for 2.2 or maybe even for 2.3. /Nicklas > > E.C. > > > > ------------------------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > The BASE general discussion mailing list > basedb-users@lists.sourceforge.net > unsubscribe: send a mail with subject "unsubscribe" to > [EMAIL PROTECTED] ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ The BASE general discussion mailing list basedb-users@lists.sourceforge.net unsubscribe: send a mail with subject "unsubscribe" to [EMAIL PROTECTED]