On 18/03/2008, Sonam Chauhan <[EMAIL PROTECTED]> wrote:
> Hello List -
>
> Just wanted some advice fixing an issue with JMX file encoding that has been
> affecting me for some time (extended thread below).
>
> Some test scripts written way back in JMeter 1.9 posted extended ASCII
> characters (including 'ý' and 'ü') to a web server.
>
> When migrating to JMeter 2.1.1, the scripts broke - JMeter would now send
> incorrect data to the server. With Sebb's help, I found setting the
> environment LANG variable to 'en_AU' had them work properly again.
>
> When migrating to JMeter 2.3.1, this workaround no longer worked. I then
> changed the following property in <JMeter>/bin/saveservice.properties from
> UTF-8 to ISO-8859-1:
> --------------------------------
> # Character set encoding used to read and write JMeter XML files
> #
> _file_encoding=ISO-8859-1
> --------------------------------
>
> The scripts now work properly again.
>
> However, any new JMX scripts created now have this header:
> <?xml version="1.0" encoding="ISO-8859-1"?>
>
> Some of the scripts I created earlier have this in the header:
> <?xml version="1.0" encoding="UTF-8"?>
Those headers are related to the encoding property above.
> Some scripts (the oldest) have no XML headers at all.
>
The header encoding was not written by JMeter 2.1.1.
> The 'UTF-8' scripts work fine (even with '_file_encoding=ISO-8859-1') --
> probably since those tests only handle 'normal' ASCII data (no extended
> characters).
They should work even if there are non-ASCII characters, so long as
they are UTF-8 encoded ...
The idea of adding the encoding header was to ensure portability of
scripts - once the script has been created in a particular encoding,
it should run OK regardless of what the file encoding property is set
to.
If that is not happening, then it is a bug that needs to be fixed ...
> For uniformity, and flexibility in future testing I wanted to bulk-convert
> all existing JMeter scripts to use the 'UTF-8' encoding only. Here's what I
> plan to do:
>
> THE CUNNING PLAN!
> ==================
>
> 1. Convert all JMX script to UTF-8
> I plan to use the following Perl one-liner as a Perl UTF-8 conversion
> utility:
> --------------------------------
> [ After installing the 'Unicode::MapUTF8' CPAN module ]
>
> cat ISO-8859-1.jmx | perl -lne ' use Unicode::MapUTF8 qw(to_utf8); print
> to_utf8({ -string => $_, -charset => "ISO-8859-1" });' > UTF-8.jmx
No need to use cat, you can give the filename directly to Perl:
perl -lne ' use Unicode:...' ISO-8859-1.jmx > UTF-8.jmx
But see below.
> --------------------------------
>
> 2. Change the _file_encoding property back to UTF-8
>
> 3. Manually change any 'ISO-8859-1' XML headers in JMX scripts to
> 'UTF-8'
Or use Perl to do it.
It's probably easier to use a Perl script to do this all in one go;
something like:
#=== iso2utf8.pl =====
use strict;
use Unicode::MapUTF8 qw(to_utf8);
$_=<>; # read first line - assumed to be the <?xml header
s/ISO-8859-1/UTF-8/;
print;
# Process the rest of the file
while(<>){
print to_utf8({ -string => $_, -charset => "ISO-8859-1" });
}
#=========
$ perl -w iso2utf8.pl ISO-8859-1.jmx > UTF-8.jmx
>
> ==================
>
>
>
> Does anyone forsee any problems/ have any advice?
>
>
> Regards,
> Sonam Chauhan
> --
> Corporate Express Australia Ltd.
> Phone: +61-2-93350725, Email: [EMAIL PROTECTED]
> -----Original Message-----
> From: Sonam Chauhan
> Sent: Thursday, 25 October 2007 4:57 PM
> To: 'JMeter Users List'
> Subject: RE: extended ASCII character handling in JMeter/Java
>
> Thanks Sebb. You had asked:
>
> > > ------------------------
> > > ý ---> %C3%BD
> > > ü ---> %C3%BC
> > > ------------------------
> >
> > Are these the correct conversions?
>
> I don't really know. According to this page:
> http://www.albionresearch.com/misc/urlencode.php
> ... _these_ are the correct conversions:
> ý ---> %FD
> ü ---> %FC
>
> A Java application we use (webMethods), declare these conversions are
> interchangeable. It decodes both %FD and %C3%BD as ý.
>
> However the web page above decodes %FD to ý ('y' with an aigu accent), but
> decodes %C3%BD as ý ('A' with a tilde accent and a "1/2" sign). Perhaps this
> is a UTF-8 multi-byte issue? I wish I had a UTF-8 editor that showed the hex
> representation of what one typed.
>
> Kind regards,
> Sonam Chauhan
> --
> Corporate Express Australia Ltd.
> Phone: +61-2-93350725, Email: [EMAIL PROTECTED]
> -----Original Message-----
> From: sebb [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, 24 October 2007 9:22 PM
> To: JMeter Users List
> Subject: Re: extended ASCII character handling in JMeter/Java
>
> On 24/10/2007, Sonam Chauhan <[EMAIL PROTECTED]> wrote:
> > Thanks Sebb - we're still using JMeter 2.1.1 (our test harness is source
> controlled as well). I'll push for an update to 2.3.
> >
> >
> > [ All the info below is from 2.1.1. ]
> >
> > 2.1.1 does not seem to have a file encoding property in
> saveservice.properties.
>
> It was added later.
>
> > The post data is setup through parameters.
>
> OK.
>
> > I don't know how to obtain info about the HTTP request encoding used.
> However, if you meant the 'Encode?' tickbox setting in the HTTP Sampler, it's
> ticked -- hence data posted should be URL-encoded.
>
> The encoding field is another new field.
>
> > Here's how the special character data is stored in the .jmx (copy/pasted
> by opening the JMX in Windows XP Notepad ... hope it's not mangled):
> > ------------------------
> > <jmeterTestPlan version="1.1" properties="1.2">
> > ...
> > <stringProp name="Argument.value">... ýNNüV... </stringProp>
> > ------------------------
> >
> > Here's how these characters are represented in the actual POST URL (viewed
> through 'View Results Tree')
> > ------------------------
> > ý ---> %C3%BD
> > ü ---> %C3%BC
> > ------------------------
>
> Are these the correct conversions?
>
> > I couldn't find the value for file.encoding in jmeter.log. However, I did
> get this on both Windows and AIX:
>
> OK, that's a new log item - but one can get the value using the __P()
> function (or a trivial Java program).
>
> > ------------------------
> > 2007/10/24 11:02:14 INFO - jmeter.samplers.SampleResult:
> sampleresult.default.encoding is set to ISO-8859-1
> > ------------------------
> > This occurs when I set LANG to either en_AU or en_AU.utf8.
>
> Yes, that is a property you can set to override the default sample
> result encoding - which is used if the response content-type does not
> specify an encoding (charset). If not set, it defaults to ISO-8859-1,
> which is what the log message is showing.
>
> > Kind regards,
> > Sonam Chauhan
> > --
> > Corporate Express Australia Ltd.
> > Phone: +61-2-93350725, Email: [EMAIL PROTECTED]
> > -----Original Message-----
> > From: sebb [mailto:[EMAIL PROTECTED]
> > Sent: Tuesday, 23 October 2007 10:01 PM
> > To: JMeter Users List
> > Subject: Re: extended ASCII character handling in JMeter/Java
> >
> > Which version of JMeter?
> >
> > There have been quite a lot of recent changes in the handling of JMX files.
> >
> > They now use UTF-8 by default - _file_encoding property in
> > saveservice.properties.
> >
> > Originally JMeter used the platform default encoding, which would tend
> > to cause problems when the JMX file are used on different systems.
> >
> > This was addressed in bug 36755, which was fixed in 2.3RC3.
> >
> > There have also been various other encoding fixes - see the changes file.
> >
> > How are you setting up the POST data?
> > Using files, or parameters?
> > What encoding are you using for the HTTP request?
> >
> > What is the value of the Java property "file.encoding" on the various
> systems?
> > This is now shown in the jmeter log file.
> >
> > On 23/10/2007, Sonam Chauhan <[EMAIL PROTECTED]> wrote:
> > > Hello - I got hit by an old issue again this year, so wanted to ask
> about JMeter/Java handling of extended ASCII characters.
> > >
> > > I have some testcase that use extended ASCII characters 252 and 255 ('ı'
> and 'ü') as record separators in text data posted by the JMeter HTTP sampler.
> The testcases were created on Windows XP - the data was simply copy/pasted
> into the JMeter GUI.
> > >
> > > When these tests ran on Linux, I found the LANG environment variable had
> to be set as follows to make the tests work (email below from last year)
> > > LANG=en_AU
> > > This year, I moved to AIX this year and the tests failed again - the
> cause was 'ı' and 'ü' characters in the data were now posted in as a "?". I
> found the LANG variable on AIX was 'en_AU.utf8'. When set back to 'en_AU',
> the tests began posting in the correct values.
> > >
> > > My question is what is causing this behavior - does Java or JMeter use
> data in JMX files differently depending on the LANG variable in UNIX?
> > >
> > > Kind regards,
> > > Sonam Chauhan
> > > --
> > > Corporate Express Australia Ltd.
> > > Phone: +61-2-93350725, Email: [EMAIL PROTECTED]
> > >
> > > _____________________________________________
> > > From: Sonam Chauhan
> > > Sent: Wednesday, 20 December 2006 2:27 PM
> > > To: 'JMeter Users List'
> > > Subject: JMeter under cron
> > >
> > > Just a cautionary tale of running JMeter through a cron job on a Linux
> system.
> > >
> > > We have a JMeter-based regression-test suite at work. This has run
> nightly for several years as a cron job. Recently, we added tests that post
> in extended ASCII data (which has 'ı' and 'ü' record separators) which
> sometimes passed, and sometimes failed. After much debugging I found the new
> tests failed when automatically run by cron, but passed when run by an
> interactive terminal session.
> > >
> > > When executed in an interactive terminal session, LANG is set to:
> > > LANG=en_AU
> > > However, cron sets the Unix LANG environment variable to POSIX. Ie:
> > > LANG=POSIX
> > > This seems to be causing the proble,.
> > >
> > > I got the tests running by prefixing the test suite crontab entry with
> "export LANG=en_AU ;"
> > > ie: The entry is now:
> > > 30 20 * * * export LANG=en_AU ; $HOME/runsuite.sh >> $HOME/tmp.out 2>&1
> > > This got these tests running.
> > >
> > > Regards,
> > > Sonam Chauhan
> > >
> > > PS: 'locale -a' on the system shows that UTF-8 encoded English is also a
> support LANG attribute:
> > > en_AU.utf8
> > > I guess this may be more pertinent for those whose testcases post in
> binary data.
> > >
> > >
> > >
> >
>
> The information contained in this email and any attached files are strictly
> private and confidential. This email should be read by the intended addressee
> only. If the recipient of this message is not the intended addressee, please
> call Corporate Express Australia Limited on +61 2 9335 0555 or Corporate
> Express
> New Zealand Limited on +64 9 279 2555 and promptly delete this email and any
> attachments. The intended recipient of this email may only use, reproduce,
> disclose or distribute the information contained in this email and any
> attached
> files with Corporate Express' permission. If you are not the intended
> addressee,
> you are strictly prohibited from using, reproducing, disclosing or
> distributing
> the information contained in this email and any attached files. Corporate
> Express advises that this email and any attached files should be scanned to
> detect viruses. Corporate Express accepts no liability for loss or damage
> (whether caused by negligence or not) resulting from the use of any attached
> files.
>