Re: Long initialisation for utf8* encodings

2008-02-09 Thread Abdelrazak Younes
Abdelrazak Younes wrote: Hum, it seems that I was mis leaded by the slower debug version. In release mode, the set construction is sensible with utf8 but that's not all of it apparently... Some numbers for updateSymbolList() obtained with the attached patch with a document with 'utf8'

Re: Long initialisation for utf8* encodings

2008-02-09 Thread Abdelrazak Younes
Jürgen Spitzmüller wrote: Abdelrazak Younes wrote: No, the problem lies is when we insert the symbols from the unicodesymbols file. For utf8, we shouldn't do that because _all_ symbols are already in there. On each insertion, std::set() has to search if the given symbols is not already present;

Re: Long initialisation for utf8* encodings

2008-02-09 Thread Jürgen Spitzmüller
Abdelrazak Younes wrote: Try and see what happens if you bypass the unicodesymbols part if star_encodable_ == ucs4_max. I saw in the UserGuide that only utf8-plain should never make use of the 'unicodesymbols' file; is that what you mean? No, I just meant: try it out. It might indeed

Re: Long initialisation for utf8* encodings

2008-02-09 Thread Abdelrazak Younes
Jürgen Spitzmüller wrote: Abdelrazak Younes wrote: Try and see what happens if you bypass the unicodesymbols part if star_encodable_ == ucs4_max. I saw in the UserGuide that only utf8-plain should never make use of the 'unicodesymbols' file; is that what you mean? No, I just meant: try it

Re: Long initialisation for utf8* encodings

2008-02-09 Thread Jürgen Spitzmüller
Abdelrazak Younes wrote: Oh I did, and it improves things a bit indeed but see my other mail instead. I saw it. But I don't have time to look at it ATM. Jürgen

Re: Long initialisation for utf8* encodings

2008-02-09 Thread Abdelrazak Younes
Abdelrazak Younes wrote: Hum, it seems that I was mis leaded by the slower debug version. In release mode, the set construction is sensible with utf8 but that's not all of it apparently... Some numbers for updateSymbolList() obtained with the attached patch with a document with 'utf8'

Re: Long initialisation for utf8* encodings

2008-02-09 Thread Abdelrazak Younes
Jürgen Spitzmüller wrote: Abdelrazak Younes wrote: No, the problem lies is when we insert the symbols from the unicodesymbols file. For utf8, we shouldn't do that because _all_ symbols are already in there. On each insertion, std::set() has to search if the given symbols is not already present;

Re: Long initialisation for utf8* encodings

2008-02-09 Thread Jürgen Spitzmüller
Abdelrazak Younes wrote: > > Try and see what happens if you bypass the unicodesymbols part if > > star_encodable_ == ucs4_max. > > I saw in the UserGuide that only "utf8-plain" should never make use of > the 'unicodesymbols' file; is that what you mean? No, I just meant: try it out. It might

Re: Long initialisation for utf8* encodings

2008-02-09 Thread Abdelrazak Younes
Jürgen Spitzmüller wrote: Abdelrazak Younes wrote: Try and see what happens if you bypass the unicodesymbols part if star_encodable_ == ucs4_max. I saw in the UserGuide that only "utf8-plain" should never make use of the 'unicodesymbols' file; is that what you mean? No, I just meant: try it

Re: Long initialisation for utf8* encodings

2008-02-09 Thread Jürgen Spitzmüller
Abdelrazak Younes wrote: > Oh I did, and it improves things a bit indeed but see my other mail > instead. I saw it. But I don't have time to look at it ATM. Jürgen

Long initialisation for utf8* encodings

2008-02-08 Thread Abdelrazak Younes
Juergen, There is something fishy in this method. setchar_type Encoding::getSymbolsList() const { // assure the used encoding is properly initialized init(); // first all encodable characters CharSet symbols = encodable_; // add those below

Re: Long initialisation for utf8* encodings

2008-02-08 Thread Andre Poenitz
On Fri, Feb 08, 2008 at 08:04:56PM +0100, Abdelrazak Younes wrote: Juergen, There is something fishy in this method. setchar_type Encoding::getSymbolsList() const { // assure the used encoding is properly initialized init(); // first all encodable characters

Re: Long initialisation for utf8* encodings

2008-02-08 Thread Abdelrazak Younes
Abdelrazak Younes wrote: Juergen, There is something fishy in this method. setchar_type Encoding::getSymbolsList() const { // assure the used encoding is properly initialized init(); // first all encodable characters CharSet symbols = encodable_; // add those below

Re: Long initialisation for utf8* encodings

2008-02-08 Thread Abdelrazak Younes
Abdelrazak Younes wrote: Abdelrazak Younes wrote: Juergen, There is something fishy in this method. setchar_type Encoding::getSymbolsList() const { // assure the used encoding is properly initialized init(); // first all encodable characters CharSet symbols = encodable_;

Re: Long initialisation for utf8* encodings

2008-02-08 Thread Abdelrazak Younes
Andre Poenitz wrote: On Fri, Feb 08, 2008 at 08:04:56PM +0100, Abdelrazak Younes wrote: Juergen, There is something fishy in this method. setchar_type Encoding::getSymbolsList() const { // assure the used encoding is properly initialized init(); // first all encodable

Re: Long initialisation for utf8* encodings

2008-02-08 Thread Jürgen Spitzmüller
Abdelrazak Younes wrote: No, the problem lies is when we insert the symbols from the unicodesymbols file. For utf8, we shouldn't do that because _all_ symbols are already in there. On each insertion, std::set() has to search if the given symbols is not already present; as you have 1114112

Long initialisation for utf8* encodings

2008-02-08 Thread Abdelrazak Younes
Juergen, There is something fishy in this method. set Encoding::getSymbolsList() const { // assure the used encoding is properly initialized init(); // first all encodable characters CharSet symbols = encodable_; // add those below start_encodable_

Re: Long initialisation for utf8* encodings

2008-02-08 Thread Andre Poenitz
On Fri, Feb 08, 2008 at 08:04:56PM +0100, Abdelrazak Younes wrote: > Juergen, > > There is something fishy in this method. > > set Encoding::getSymbolsList() const > { > // assure the used encoding is properly initialized > init(); > > // first all encodable characters >

Re: Long initialisation for utf8* encodings

2008-02-08 Thread Abdelrazak Younes
Abdelrazak Younes wrote: Juergen, There is something fishy in this method. set Encoding::getSymbolsList() const { // assure the used encoding is properly initialized init(); // first all encodable characters CharSet symbols = encodable_; // add those below start_encodable_

Re: Long initialisation for utf8* encodings

2008-02-08 Thread Abdelrazak Younes
Abdelrazak Younes wrote: Abdelrazak Younes wrote: Juergen, There is something fishy in this method. set Encoding::getSymbolsList() const { // assure the used encoding is properly initialized init(); // first all encodable characters CharSet symbols = encodable_; // add

Re: Long initialisation for utf8* encodings

2008-02-08 Thread Abdelrazak Younes
Andre Poenitz wrote: On Fri, Feb 08, 2008 at 08:04:56PM +0100, Abdelrazak Younes wrote: Juergen, There is something fishy in this method. set Encoding::getSymbolsList() const { // assure the used encoding is properly initialized init(); // first all encodable

Re: Long initialisation for utf8* encodings

2008-02-08 Thread Jürgen Spitzmüller
Abdelrazak Younes wrote: > No, the problem lies is when we insert the symbols from the > unicodesymbols file. For utf8, we shouldn't do that because _all_ > symbols are already in there. On each insertion, std::set() has to > search if the given symbols is not already present; as you have 1114112