Re: [Barry-devel] bsmstool for SMS parsing

Chris Frey Wed, 08 Apr 2009 15:55:08 -0700

On Sun, Apr 05, 2009 at 04:09:49PM +0800, Ryan Li wrote:
> Hello again Chris,


Hi Ryan, thanks for your continued work on SMS.


> I've partly rewrote the parser, solved the encoding problem. But I need 
> to use the class function IConverter::Convert(iconv.h), hence I modified 
> it to public. If you would like to keep it private, I could copy this 
> function to SMS parser.
> Another thing, is it appropriate to include the errno header? I 
> simulated the creation of an iconv_t variable in iconv.cc.

Sometimes it is needed to include <errno.h>.. yep, that's fine.


> But however, I think that there should be some improvement on Barry's 
> iconv, since BlackBerry doesn't entirely use WINDOWS-1252 as its 
> charset(not even in contacts with non-Latin names, don't know what they 
> use), maybe it should be the programmers to decide what encoding to 
> convert from, and the users to decide what encoding to convert to. And 
> since UCS-2 is rarely used in computer, I ignored the ic's indication of 
> what encoding to convert to and chose UTF-8.

I agree with you that IConverter needs an update, but I disagree with
the solution.

I want to keep IConverter::Convert() private, in order to hide all iconv_t
handle types.  In your r_sms.cc code, there is already one leak of an
iconv_t handle, so I'd much rather have all calls to iconv_open() and
iconv_close() be kept inside the IConverter class and supporting classes.

We can't ignore ic's indication of what encoding to convert to, since this
is specified by the application (and therefore, the user).  This is our
only indication of what the user wants to see.

I've updated the API so that you can create custom handles in the conversion
direction you wish to go, and based on IConverter's user-specified 'tocode'.
So, for example, in your r_sms.cc code, instead of:

                        if (DataCodingScheme == UCS2)
                        {
                                iconv_t ucs2utf8;
                                ucs2utf8 = iconv_open("UTF-8", "UCS-2");
                                if (ucs2utf8 == (iconv_t)(-1))
                                        throw ErrnoError("iconv_open failed 
(ucs2 to utf8)", errno);
                                IConverter iconv_ucs("UTF-8", true);

                                Body = UCS2Preprocess(Body);
                                Body = iconv_ucs.Convert(ucs2utf8, Body);
                        }

You should now use something like:

                        if (DataCodingScheme == UCS2 ) {
                                Body = UCS2Preprocess(Body);
                                if( ic ) {
                                        IConvHandle ucs2("UCS-2", *ic);
                                        Body = ic->Convert(ucs2, Body);
                                }
                        }

There are new documentation comments in src/iconv.h which should clarify
the usage.

If the application / user does not give us an IConverter object, then just
pass through all data without change.  It may look like garbage, but the
user asked for it.

I'm curious about the preprocess function... it looks like it's just a
big / little endian issue... I'm surprised that the iconv libraries can't
handle that... maybe we're specifying the encoding incorrectly?
Looking at http://en.wikipedia.org/wiki/UTF-16, under "Byte order encoding
schemes" it looks like we should just prepend a Byte Order Mark (BOM)
and then let iconv handle it.


> And then, timestamps. The two timestamps are taken directly from 
> BlackBerry, accurate to milliseconds, I don't want to lose data, some 
> developers may also be interested in this. Hence I wrote two functions 
> returning time_t for developers to get timestamps, and added comment 
> where they are declared.

If there are functions to get time_t, there should be corresponding functions
to set it via time_t as well.


> And there's one thing I'm curious about, that how could a programmer 
> using libbarry write records back to BlackBerry using this parser? I 
> thought that the parsers are only for reading records from BlackBerry?

See the code in src/r_contact.cc.  There are two sets of functions in all
record classes.  One is ParseHeader() and ParseFields().  The other is
BuildHeader() and BuildFields().

Not all record classes have the Build*() functions implemented yet, but
that's how it is done.


> Still don't know how to submit a patch through Git???. So again, attached 
> them here.

See http://www.netdirect.ca/software/packages/barry/patches.php for
docs on how to submit patches.  With git, you can create a fork on
repo.or.cz and push your changes there, or even just push to the mob
branch, or create your changes on a git branch of your own, and
then do:

        git diff master mybranch

If you've taken the time to make the patches in small chunks, and have
a series of them to submit, use git-format-patch:

        git format-patch master

and post the resulting 000*.patch files to the mailing list.

Feel free to ask any questions on how to work with git and Barry.  I'm a git
fan, and am always looking for ways to make things easier for contributors
and myself, when dealing with patches.

When you have that figured out, please resubmit your patch, and include
your s11n-boost.h changes needed to get everything to compile.  They were
missing and it didn't fully compile here.

Thanks!
- Chris


------------------------------------------------------------------------------
This SF.net email is sponsored by:
High Quality Requirements in a Collaborative Environment.
Download a free trial of Rational Requirements Composer Now!
http://p.sf.net/sfu/www-ibm-com
_______________________________________________
Barry-devel mailing list
Barry-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/barry-devel

Re: [Barry-devel] bsmstool for SMS parsing

Reply via email to