Re: extended ASCII character handling in JMeter/Java

sebb Tue, 18 Mar 2008 06:19:12 -0700

On 18/03/2008, Sonam Chauhan <[EMAIL PROTECTED]> wrote:
> Hello List -
>
>  Just wanted some advice fixing an issue with JMX file encoding that has been 
> affecting me for some time (extended thread below).
>
>  Some test scripts written way back in JMeter 1.9 posted extended ASCII 
> characters (including 'ý' and 'ü') to a web server.
>
>  When migrating to JMeter 2.1.1, the scripts broke - JMeter would now send 
> incorrect data to the server. With Sebb's help, I found setting the 
> environment LANG variable to 'en_AU' had them work properly again.
>
>  When migrating to JMeter 2.3.1, this workaround no longer worked. I then 
> changed the following property in <JMeter>/bin/saveservice.properties from 
> UTF-8 to ISO-8859-1:
>  --------------------------------
>  # Character set encoding used to read and write JMeter XML files
>  #
>  _file_encoding=ISO-8859-1
>  --------------------------------
>
>  The scripts now work properly again.
>
>  However, any new JMX scripts created now have this header:
>         <?xml version="1.0" encoding="ISO-8859-1"?>
>
>  Some of the scripts I created earlier have this in the header:
>         <?xml version="1.0" encoding="UTF-8"?>


Those headers are related to the encoding property above.

>  Some scripts (the oldest) have no XML headers at all.
>

The header encoding was not written by JMeter 2.1.1.

>  The 'UTF-8' scripts work fine (even with '_file_encoding=ISO-8859-1') -- 
> probably since those tests only handle 'normal' ASCII data (no extended 
> characters).

They should work even if there are non-ASCII characters, so long as
they are UTF-8 encoded ...

The idea of adding the encoding header was to ensure portability of
scripts - once the script has been created in a particular encoding,
it should run OK regardless of what the file encoding property is set
to.

If that is not happening, then it is a bug that needs to be fixed ...

>  For uniformity, and flexibility in future testing I wanted to bulk-convert 
> all existing JMeter scripts to use the 'UTF-8' encoding only. Here's what I 
> plan to do:
>
>  THE CUNNING PLAN!
>  ==================
>
>  1.      Convert all JMX script to UTF-8
>  I plan to use the following Perl one-liner as a Perl UTF-8 conversion 
> utility:
>  --------------------------------
>  [ After installing the 'Unicode::MapUTF8' CPAN module ]
>
>  cat ISO-8859-1.jmx | perl -lne ' use Unicode::MapUTF8 qw(to_utf8); print 
> to_utf8({ -string => $_, -charset => "ISO-8859-1" });' > UTF-8.jmx

No need to use cat, you can give the filename directly to Perl:

perl -lne ' use Unicode:...' ISO-8859-1.jmx > UTF-8.jmx

But see below.

>  --------------------------------
>
>  2.      Change the _file_encoding property back to UTF-8
>
>  3.      Manually change any 'ISO-8859-1' XML headers in JMX scripts to 
> 'UTF-8'

Or use Perl to do it.

It's probably easier to use a Perl script to do this all in one go;
something like:

#=== iso2utf8.pl =====
use strict;
use Unicode::MapUTF8 qw(to_utf8);
$_=<>; # read first line - assumed to be the <?xml header
s/ISO-8859-1/UTF-8/;
print;
# Process the rest of the file
while(<>){
    print to_utf8({ -string => $_, -charset => "ISO-8859-1" });
}
#=========

$ perl -w iso2utf8.pl ISO-8859-1.jmx > UTF-8.jmx

>
>  ==================
>
>
>
>  Does anyone forsee any problems/ have any advice?
>
>
>  Regards,
>  Sonam Chauhan
>  --
>  Corporate Express Australia Ltd.
>  Phone: +61-2-93350725, Email: [EMAIL PROTECTED]
>  -----Original Message-----
>  From: Sonam Chauhan
>  Sent: Thursday, 25 October 2007 4:57 PM
>  To: 'JMeter Users List'
>  Subject: RE: extended ASCII character handling in JMeter/Java
>
>  Thanks Sebb. You had asked:
>
>  > > ------------------------
>  > > ý ---> %C3%BD
>  > > ü ---> %C3%BC
>  > > ------------------------
>  >
>  > Are these the correct conversions?
>
>  I don't really know. According to this page:
>         http://www.albionresearch.com/misc/urlencode.php
>  ... _these_ are the correct conversions:
>         ý ---> %FD
>         ü ---> %FC
>
>  A Java application we use (webMethods), declare these conversions are 
> interchangeable. It decodes both %FD and %C3%BD as ý.
>
>  However the web page above decodes %FD to ý ('y' with an aigu accent), but 
> decodes %C3%BD as Ã½ ('A' with a tilde accent and a "1/2" sign). Perhaps this 
> is a UTF-8 multi-byte issue? I wish I had a UTF-8 editor that showed the hex 
> representation of what one typed.
>
>  Kind regards,
>  Sonam Chauhan
>  --
>  Corporate Express Australia Ltd.
>  Phone: +61-2-93350725, Email: [EMAIL PROTECTED]
>  -----Original Message-----
>  From: sebb [mailto:[EMAIL PROTECTED]
>  Sent: Wednesday, 24 October 2007 9:22 PM
>  To: JMeter Users List
>  Subject: Re: extended ASCII character handling in JMeter/Java
>
>  On 24/10/2007, Sonam Chauhan <[EMAIL PROTECTED]> wrote:
>  > Thanks Sebb - we're still using JMeter 2.1.1 (our test harness is source 
> controlled as well). I'll push for an update to 2.3.
>  >
>  >
>  > [ All the info below is from 2.1.1. ]
>  >
>  > 2.1.1 does not seem to have a file encoding property in 
> saveservice.properties.
>
>  It was added later.
>
>  > The post data is setup through parameters.
>
>  OK.
>
>  > I don't know how to obtain info about the HTTP request encoding used. 
> However, if you meant the 'Encode?' tickbox setting in the HTTP Sampler, it's 
> ticked -- hence data posted should be URL-encoded.
>
>  The encoding field is another new field.
>
>  > Here's how the special character data is stored in the .jmx (copy/pasted 
> by opening the JMX in Windows XP Notepad ... hope it's not mangled):
>  > ------------------------
>  > <jmeterTestPlan version="1.1" properties="1.2">
>  >        ...
>  >   <stringProp name="Argument.value">... ýNNüV... </stringProp>
>  > ------------------------
>  >
>  > Here's how these characters are represented in the actual POST URL (viewed 
> through 'View Results Tree')
>  > ------------------------
>  > ý ---> %C3%BD
>  > ü ---> %C3%BC
>  > ------------------------
>
>  Are these the correct conversions?
>
>  > I couldn't find the value for file.encoding in jmeter.log. However, I did 
> get this on both Windows and AIX:
>
>  OK, that's a new log item - but one can get the value using the __P()
>  function (or a trivial Java program).
>
>  > ------------------------
>  > 2007/10/24 11:02:14 INFO  - jmeter.samplers.SampleResult: 
> sampleresult.default.encoding is set to ISO-8859-1
>  > ------------------------
>  > This occurs when I set LANG to either en_AU or en_AU.utf8.
>
>  Yes, that is a property you can set to override the default sample
>  result encoding - which is used if the response content-type does not
>  specify an encoding (charset). If not set, it defaults to ISO-8859-1,
>  which is what the log message is showing.
>
>  > Kind regards,
>  > Sonam Chauhan
>  > --
>  > Corporate Express Australia Ltd.
>  > Phone: +61-2-93350725, Email: [EMAIL PROTECTED]
>  > -----Original Message-----
>  > From: sebb [mailto:[EMAIL PROTECTED]
>  > Sent: Tuesday, 23 October 2007 10:01 PM
>  > To: JMeter Users List
>  > Subject: Re: extended ASCII character handling in JMeter/Java
>  >
>  > Which version of JMeter?
>  >
>  > There have been quite a lot of recent changes in the handling of JMX files.
>  >
>  > They now use UTF-8 by default - _file_encoding property in
>  > saveservice.properties.
>  >
>  > Originally JMeter used the platform default encoding, which would tend
>  > to cause problems when the JMX file are used on different systems.
>  >
>  > This was addressed in bug 36755, which was fixed in 2.3RC3.
>  >
>  > There have also been various other encoding fixes - see the changes file.
>  >
>  > How are you setting up the POST data?
>  > Using files, or parameters?
>  > What encoding are you using for the HTTP request?
>  >
>  > What is the value of the Java property "file.encoding" on the various 
> systems?
>  > This is now shown in the jmeter log file.
>  >
>  > On 23/10/2007, Sonam Chauhan <[EMAIL PROTECTED]> wrote:
>  > > Hello  - I got hit by an old issue again this year, so wanted to ask 
> about JMeter/Java handling of extended ASCII characters.
>  > >
>  > > I have some testcase that use extended ASCII characters 252 and 255 ('ı' 
> and 'ü') as record separators in text data posted by the JMeter HTTP sampler. 
> The testcases were created on Windows XP - the data was simply copy/pasted 
> into the JMeter GUI.
>  > >
>  > > When these tests ran on Linux, I found the LANG environment variable had 
> to be set as follows to make the tests work  (email below from last year)
>  > > LANG=en_AU
>  > > This year, I moved to AIX this year and the tests failed again - the 
> cause was 'ı' and 'ü' characters in the data were now posted in as a "?". I 
> found the LANG variable on AIX was 'en_AU.utf8'. When set back to 'en_AU', 
> the tests began posting in the correct values.
>  > >
>  > > My question is what is causing this behavior - does Java or JMeter use 
> data in JMX files differently depending on the LANG variable in UNIX?
>  > >
>  > > Kind regards,
>  > > Sonam Chauhan
>  > > --
>  > > Corporate Express Australia Ltd.
>  > > Phone: +61-2-93350725, Email: [EMAIL PROTECTED]
>  > >
>  > > _____________________________________________
>  > > From: Sonam Chauhan
>  > > Sent: Wednesday, 20 December 2006 2:27 PM
>  > > To: 'JMeter Users List'
>  > > Subject: JMeter under cron
>  > >
>  > > Just a cautionary tale of running JMeter through a cron job on a Linux 
> system.
>  > >
>  > > We have a JMeter-based regression-test suite at work. This has run 
> nightly for several years as a cron job. Recently, we added tests that post 
> in extended ASCII data (which has 'ı' and 'ü' record separators) which 
> sometimes passed, and sometimes failed. After much debugging I found the new 
> tests failed when automatically run by cron, but passed when run by an 
> interactive terminal session.
>  > >
>  > > When executed in an interactive terminal session, LANG is set to:
>  > >        LANG=en_AU
>  > > However, cron sets the Unix LANG environment variable to POSIX. Ie:
>  > > LANG=POSIX
>  > > This seems to be causing the proble,.
>  > >
>  > > I got the tests running by prefixing the test suite crontab entry with 
> "export LANG=en_AU ;"
>  > > ie: The entry is now:
>  > > 30 20 * * * export LANG=en_AU ; $HOME/runsuite.sh >> $HOME/tmp.out 2>&1
>  > > This got these tests running.
>  > >
>  > > Regards,
>  > > Sonam Chauhan
>  > >
>  > > PS: 'locale -a' on the system shows that UTF-8 encoded English is also a 
> support LANG attribute:
>  > > en_AU.utf8
>  > > I guess this may be more pertinent for those whose testcases post in 
> binary data.
>  > >
>  > >
>  > >
>  >
>
>  The information contained in this email and any attached files are strictly
>  private and confidential. This email should be read by the intended addressee
>  only.  If the recipient of this message is not the intended addressee, please
>  call Corporate Express Australia Limited on +61 2 9335 0555 or Corporate 
> Express
>  New Zealand Limited on +64 9 279 2555 and promptly delete this email and any
>  attachments.  The intended recipient of this email may only use, reproduce,
>  disclose or distribute the information contained in this email and any 
> attached
>  files with Corporate Express' permission. If you are not the intended 
> addressee,
>  you are strictly prohibited from using, reproducing, disclosing or 
> distributing
>  the information contained in this email and any attached files.  Corporate
>  Express advises that this email and any attached files should be scanned to
>  detect viruses. Corporate Express accepts no liability for loss or damage
>  (whether caused by negligence or not) resulting from the use of any attached
>  files.
>

Re: extended ASCII character handling in JMeter/Java

Reply via email to